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Foreword by Melanie Mitchell 


The idea that machines could one day exhibit ‘general intelligence’ has both inspired 
and confounded the field of AI since its inception. ‘Inspired’ by drawing talented 
people into the field to achieve one of humanity’s grandest challenges. ‘Confounded’ 
since there is no widely agreed-upon definition of what “general intelligence’ actually 
means or a definitive list of the properties it entails. 

The pursuit of general intelligence in AI has been, in large part, the story of 
moving goal posts. Variously called ‘human-level AI’, ‘strong AI’, ‘AGI’, and even 
“‘superintelligence’, the criteria for machines exhibiting general intelligence have 
continually changed over time. Alan Turing, in his classic 1950 paper “Computing 
Machinery and Intelligence’, proposed that a machine should be considered intel- 
ligent (or ‘thinking’) if it could persuade a human judge via conversation alone. 
However, the rise of ever more capable chatbots (and the surprising propensity of 
humans to assign intentionality to machines) has shown that AI conversationalists 
that clearly lack general intelligence can easily fool humans in many cases. Early 
AI proponents believed that something like general intelligence could be captured 
in systems that heuristically followed symbolic rules. Many of the early founders of 
AI predicted in the 1960s that human-level AI was only 10, 15, or at most 20 years 
away. However, symbolic AI approaches, such as the optimistically named ‘General 
Problem Solver’, turned out to be far from general and often disastrously brittle. 

Intellectual board games like chess and Go have long been seen as a grand chal- 
lenge for AI systems, and many people believed that conquering them would require 
something like general intelligence. In 1958, AI pioneers Newell and Simon declared, 
‘If one could devise a successful chess machine, one would seem to have penetrated to 
the core of human intellectual endeavor’. And in 1997, the year that IBM’s Deep Blue 
defeated world chess champion Garry Kasparov, the New York Times conjectured 
about Go, “When or if a computer defeats a human Go champion, it will be a sign 
that Artificial Intelligence is truly beginning to become as good as the real thing’. But 
although Deep Blue and its counterpart AlphaGo are extraordinary achievements, 
neither is anywhere close to a general intelligence. 

Since the 2010s, the rise of deep learning (and its myriad successes) has once 
again encouraged optimism among some in the AI community that ‘true AI’ is close 
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at hand. More recently, though, cracks have begun to appear in deep learning’s facade 
of intelligence, and many have expressed serious doubts about the prospects of ‘big 
data’ approaches for developing general AI. 

This book is an attempt to clarify what kind of knowledge representation and 
information processing is needed for general intelligence in machines. The authors, 
inspired by theories of semantics as well as programming-language theory, stress the 
need for representations that exhibit both ‘algebraic compositionality’ and ‘strong 
typing’—properties of programming languages that allow for explicit propagation of 
arbitrary constraints, rapid adaptation to new inputs, and reflection (in which a process 
can examine its own behavior). This is in contrast to the representations formed by 
today’s deep learning systems, which seem to have none of these properties. 

Most people in AI would agree that even when machines exhibit what seems 
like intelligent behavior (for example, producing coherent translations between 
languages) these machines don’t have anything like the kind of understanding of 
their inputs or their own behavior that humans have. The notion of ‘understanding’ 
in machines is hard to pin down, but the authors of this book note that, at the very 
least, understanding must entail the ability to transfer what one has learned to new 
tasks, a capacity that today’s state-of-the-art AI systems still struggle with, but is 
highly desired by the world of automation. The authors frame this notion as “Work 
on Command’—the ability to respond, in a reasonable time, to dynamic changes in 
the specification of the task one is faced with. 

This book includes extensive discussion of additional abilities required for general 
intelligence—for example, abduction, analogy, and hypothesis generation. Capturing 
such abilities in AI systems in a general and humanlike way has been the subject of 
much research but little progress to date, in part due to the lack of progress in capturing 
the causal knowledge and reasoning that underlies them. Here the authors describe 
how such abilities have been implemented in a reference system that exhibits what 
they call Semantically Closed Learning (inspired by the concept of semantic closure 
in open-ended evolution, and incorporating further ideas from category theory). 

In short, this book provides an intriguing and provocative framework for thinking 
about what general intelligence is, and how its essential abilities might be attain- 
able by machines in an economically viable manner. The philosophy behind both 
programming-language theory and category theory plays key roles in the formal- 
ization and development of the main ideas. The authors also provide pointers to 
what research challenges lie open. Given the complexity and intricacy of the desti- 
nation, the road to general intelligence will be a bumpy one. This book gives a 
thought-provoking view of one pragmatic direction toward this goal. 


Santa Fe, NM Melanie Mitchell 


Foreword by David Spivak 


It’s no coincidence that companies have departments, that furniture has drawers and 
shelves, that bodies have organs, that code bases have modules: things organize for a 
reason. As Herbert Simon points out in ‘Sciences of the Artificial’, organizing offers 
exponential compression in the search space for solving problems. By carving nature 
at its joints, by elegantly articulating and factoring the space of possibilities, and by 
finding the right abstractions, we enable ourselves to handle new situations in stride. 

Mathematics is the marketplace for humanity’s clearest and most reliable abstrac- 
tions. And within mathematics, category theory is unparalleled in its ability to factor 
ideas into constituent parts that fit together frictionlessly. In this book, the authors 
emphasize a subdiscipline of category theory called bidirectional transformations, 
also known by names such as polynomial functors and optics, which I join them 
in regarding as essential for expressing the crucial role of feedback in intelligent 
systems. Finding categorical abstractions to handle new situations, including creating 
more intelligent systems, requires a widespread research program that is only now 
starting to emerge. As far as I know, this is the first book to confidently assert this 
need and make a real stab at solving it. 


Berkeley, CA David Spivak 
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Chapter 1 A) 
Introduction E 


Theories may be equivalent in all their predictions and are 
hence scientifically indistinguishable. However, different views 
suggest different kinds of modifications which might be made 
and hence are not equivalent with respect to the hypotheses one 
generates from them. 


Richard P. Feynman, Nobel Lecture 1965 


The rise of civilization is synonymous with the creation of tools that extend the intel- 
lectual and physical reach of human beings [133]. The pinnacle of such endeavours is 
to replicate the flexible reasoning capacity of human intelligence within a machine, 
making it capable of performing useful work on command, despite the complexity 
and adversity of the real world. In order to achieve such Artificial Intelligence (AD, 
a new approach is required: traditional symbolic AI has long been known to be too 
rigid to model complex and noisy phenomena and the sample-driven approach of 
Deep Learning cannot scale to the long-tailed distributions of the real world. 

In this book, we describe a new approach for building a situated system that 
reflects upon its own reasoning and is capable of making decisions in light of its 
limited knowledge and resources. This reflective reasoning process addresses the 
vital safety issues that inevitably accompany open-ended reasoning: the system must 
perform its mission within a specifiable operational envelope. 

We take a perspective centered on the requirements of real-world AI, in order 
to determine how well mainstream techniques fit these requirements, and propose 
alternative techniques that we claim have a better fit. To reiterate: by AI we mean the 
property of a machine that exhibits general-purpose intelligence of the kind exhib- 
ited by humans, i.e., enjoying the ability to continually adapt existing knowledge 
to different domains. The endeavor to create intelligent machines was definitively 
proposed as such in the 1950s [220], although the concept of a humanoid automaton 
recurs throughout recorded history. Due to the sheer magnitude and ambition of the 
project, there have naturally been many bumps in the road: not only the infamous 
‘AI winter’ [202], but also periods where the endeavor’s vision and direction have 
been clouded by the prospects of short-term success. 
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2 1 Introduction 
AI for Automation 


Given that substantial resources are required to create AI, it cannot be done on a 
whim. Therefore the shape of AI (at least in its initial incarnation) will be strongly 
influenced by the return anticipated by those investing in it. That is, to answer “How 
to build AI?”, we must ask why we want AI in the first place, i.e., what is the business 
case for a machine with general intelligence? 

Philosophical considerations aside, intelligent machines are ultimately tools for 
implementing a new leap in automation. In practical automation settings, the gen- 
erality of a system is measured as the inverse of the cost of its deployment and 
maintenance in a given environment/task space. At the low end of this spectrum 
are systems that depend on full specifications of their environments and tasks. Such 
systems are very costly to re-deploy when facing specification changes, possibly 
incurring the highest cost: that of a complete rewrite. At the high end are more gen- 
eral systems that re-deploy autonomously through continual open-ended adaptation 
and anticipation. 

The main functional requirement of general intelligence is therefore to control the 
process of adaptation. In this work, we claim that this can be achieved in a unified, 
domain-agnostic manner via the ability to ground arbitrary symbols (whether arising 
from end-user vocabulary or being synthesized by the system) in an explicit learned 
semantics. Hence, throughout this work, when we discuss symbols in reference to our 
proposed architecture, it is not in the sense of the a priori opaque logical predicates 
of ‘Good Old-Fashioned AI’, but rather follows in the footsteps of a collection of 
cyberneticists, psychologists and systems theorists [8, 67, 218, 253, 261, 269, 299] 
for whom “symbols are merely shorthand notation for elements of behavioral control 
Strategies.” [49]. 

In practical terms, the endeavor of creating general intelligence therefore consists 
of building a template for a learning control system which can be re-targeted at an 
arbitrary environment, bootstrapping the control mechanisms with as little latency as 
possible, starting from small amounts of (incomplete or even faulty) knowledge. The 
system is then expected to discover further constraints on the fly—be it from a corpus 
of ready-made knowledge; from experience acquired with and without supervision; 
perhaps by interacting in the environment, possibly under the sporadic guidance of 
teachers and end-users. 

Notwithstanding these business considerations, the creation of AI still relies on 
good science, especially with regards to requirements engineering, with the initial 
focus illustrated in Fig. 1.1. Although we set aside those requirements that are mostly 
issues of hardware, paperwork, or procedures (e.g., constructing curricula for teach- 
ing the system as well as its eventual operators), the fact that they must be addressed 
and fulfilled then imposes constraints on which scientific techniques can even be con- 
sidered. The requirement-centric perspective dictates which properties are important 
for a technique to exhibit or avoid. For example, even legal requirements impinge 
on techniques, such as when GDPR! demands transparency in automated decision 


' General Data Protection Regulation (EU) 2016/679. 
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Requirements 


Compositionality 
Strong Typing 


Reflection Sect. 4.3 


ss Declarative Goals and Constraints 
Non-stationarity 


Anytime Operation 


Endogenous situatedness | Sect. 7.3 


J 


Fig. 1.1 Theme development in Part I: a summary of the most pertinent engineering requirements 
for constructing a general intelligence system of real-world use. Their importance is established 
throughout the first part of this book and then leveraged to construct our proposed framework: 
Semantically Closed Learning 


making, which is more easily fulfilled when knowledge representation and reasoning 
are not intrinsically black-box components or processes. 


The Structure of this Book 


This book begins with a survey of historical (Chap. 2) and contemporary (Chap. 3) 
AI methodologies, discussing their strengths and weaknesses, from the perspective 
of their potential to support general intelligence. Machine learning (ML), notably 
deep- and reinforcement learning, has emerged as the dominant AI paradigm. There 
are certainly many valuable applications for which ML offers functionally good solu- 
tions, in particular for industrial applications where such techniques are used to build 
control systems beyond the reach of traditional software engineering. Nevertheless, 
it remains a feat of imagination to ascribe any meaningful notion of intelligence to 
any of these systems: the constraints and ambitions of machine learning and gen- 
eral intelligence research are simply orthogonal. Although machine learning is a 
valuable engineering technique, this fact is not to be confused with a claim that it 
might offer a path toward general intelligence. In Chaps. 4 and 5, we make a criti- 
cal appraisal of this claim, by contrasting deep learning and reinforcement learning 
techniques against key requirements of general intelligence—from the perspective 
of automation engineering, these are reified by the notion of “Work on Command’ 
in Chap. 6. 

The second part of the book is concerned with an alternative framework that 
we claim fulfills better these requirements. There is increasing consensus that it is 
necessary to combine the strengths of both symbolic and connectionist paradigms 
[59, 210]: the main advantage of symbolic approaches is the ready injection of 
domain knowledge, with the attendant pruning of hypothesis space. In contrast, the 
main advantage of connectionism is that it is (at least in principle) a tabula rasa. 


4 1 Introduction 


As has been argued by Marcus for many years [214], we also hold the view that 
general intelligence requires the recursively algebraic capacities of human reasoning. 
This motivated the research and associated reference architecture implementation 
we present in this book. This architecture has been implemented, and prototypes 
have been developed, addressing the domains of medical diagnosis, service robotics, 
and industrial process automation—empirical demonstrations will be the topic of 
subsequent works. In Chaps. 7-10, we define a framework for ‘Semantically Closed 
Learning’ which: 


e Describes an explicit (but nonetheless ‘universal’ ) recursive interpreter for a highly 
generalized notion of algebraic reasoning. 

e Represents the hierarchical causal structure of hypotheses as first-class objects. 

e Defines a fine-grained and resource-aware attention mechanism, driven to favor 
highly-structured and stable hypotheses. 

e Describes key reasoning heuristics using the generic and compositional vocabulary 
of category theory, from the emerging perspective of “Categorical Cybernetics’ [42, 
140]. 

e Defines a novel compositional mechanism, using lenses [88], an approach which 
unifies conventional backpropagation, variational inference, and dynamic pro- 
gramming, for the purpose of abductive reasoning over hybrid numeric-symbolic 
expressions. 

e Describes a minimal viable implementation design for 2nd order automation engi- 
neering—system identification, synthesis, and maintenance—with guarantees rel- 
evant to safety. 


Finally, in Chap. 11, we summarize our contribution, discuss research avenues, and 
conclude. 


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 
International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, 
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Part I 
Requirements 


Chapter 2 R) 
Background rie 


It’s all these black boxes you can’t open—see how each spends 
most of its time trying to defeat the other. 


Knuth [368] 


Recent years have seen an explosion in academic, industrial, and popular interest in 
AI, as exemplified by machine learning and primarily driven by the widely-reported 
successes of deep- and reinforcement learning (e.g. [314, 315, 351]). Deep learning 
is essentially predicated on the notion that, with a sufficiently large training set, the 
statistical correlations captured by training will actually be causal [310]. However, in 
the absence of convergence theorems to support this, it remains a hypothesis. Indeed, 
insofar as there is evidence, it increasingly indicates to the contrary, since the appli- 
cation of enormous volumes of computational effort has still failed to deliver models 
with the generalization capability of an infant. There is accordingly increasing dis- 
cussion about what further conceptual or practical insights might be required [57]. 
At the time of writing, the very definition of deep learning is in flux, with one Turing 
Award laureate defining it as “a way to try to make machines intelligent by allowing 
computers to learn from examples”! and another as “differentiable programming”. 
We argue in the following that deep learning is highly unlikely to yield intelligence, 
at the very least while it equates intelligence with “solving a regression problem”. 
Specifically, we claim that it is necessary to adopt a fundamentally different perspec- 
tive on the construction of inferences from observations, and that this is in accordance 
with a fundamental revolution in the philosophy of science: Karl Popper’s celebrated 
solution to ‘The Problem of Induction’ [268]. 


l https://blogs.microsoft.com/ai/a-conversation-ai-pioneer-yoshua-bengio. 


2 https://www.facebook.com/yann.lecun/posts/10155003011462143. 
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8 2 Background 


In subsequent chapters, we first describe the significant challenges for machine 
learning. We then argue that current approaches are unlikely to be able to address them 
unless the scope of the learning framework is considerably widened, and that it will 
ultimately be necessary for this framework to be both situated and to support reflective 
reasoning. We compare our proposed conceptual framework for learning with that 
of reinforcement learning, with respect to the features necessarily associated with 
general intelligence. We propose a roadmap towards general intelligence, progressing 
via the notion of work on command to the property of semantic closure. This property 
is described fully in Chap. 7, but in précis, it equips an agent with the ability to 
determine causality and induce hierarchical representations in a manner that is absent 
from traditional machine learning approaches. 


2.1 What we Mean by General Intelligence 


As convincingly argued by Wang [357]: before any in-depth discussion of roadmaps 
and obstacles, we must define the desired destination of general intelligence. Legg 
and Hutter [194] give a comprehensive and insightful tour of various definitions of 
intelligence. They distill many of these into the following observations: 


Intelligence is not the ability to deal with a fully known environment, but rather the ability to 
deal with some range of possibilities which cannot be wholly anticipated. What is important 
then is that the individual is able to quickly learn and adapt so as to perform as well as 
possible over a wide range of environments, situations, tasks and problems. 


These observations along with others culminate in the mathematical formalism of 
universal intelligence. This defines the intelligence of an agent as being equal to the 
average of the returns (sum of rewards) it can obtain across all possible environments, 
weighted by the complexity of those environments. 

The preceding definition is problematic in that it assumes the existence of an a 
priori reward function. In natural organisms, any such reward function is assumed 
to be provided by a combination of innate and cultural mechanisms. For artificial 
systems, the implied complexity of a constructed reward function becomes a con- 
cern. In the extreme, if the reward function were to be completely arbitrary, then it 
effectively characterizes the environment as pure noise, with no useful features for a 
learner to exploit. Conversely, once the reward function is structured so as to reflect 
regularities in the environment, then at least some (and potentially a great deal) of 
the purported intelligence of the agent is actually provided by the reward function 
itself. 

In subsequent sections, we argue that some of the issues with ML are actually an 
artifact of this kind of ‘narrow framing of the problem’, in which (1) human expertise 
is required to represent each specific problem in a manner amenable to ML and (2) 
this framing then only makes a highly impoverished form of feedback available the 
learner, typically in the form of a scalar numeric reward. We argue that it is necessary 
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to replace black box reward functions with richer representations that are suitable 
for reflective reasoning. 

In pursuit of greater generality than can be provided by an a priori reward function, 
we must therefore adopt the wholly pragmatic perspective of the following value 
proposition. 


For all practical purposes, general intelligence is a necessary property of a system 
which: 
m Performs work on command 


i.e., responds with tolerable latency to dynamic changes in goal specification 
and environmental conditions. 

m Scales to real-world concerns. 

m Respects safety constraints. 

m Is explainable and auditable. 


2.2 Science as Extended Mind 


There is clearly a huge gap between the software which enables facial or gait recog- 
nition and the yet-to-be-realized technology which will allow safe and trustworthy 
autonomous vehicles or factories. One can likewise consider the reality gap between 
audio-activated digital assistants and fully-fledged household robots. There exist 
countless other examples of roles that current AI techniques are incapable of fulfill- 
ing. In roles where humans are currently irreplaceable, what traits enable them to 
meet the demands of these roles? The generality of human intelligence is evident in 
many ways: 


e Humans can handle multiple objectives simultaneously and can typically order 
activities so as to meet these objectives relatively efficiently. 

e Humans can learn skills without forgetting those previously learned. They can also 
make efficient use of related skills to bootstrap their learning process and minimize 
this effort. 

e Humans can explain their decision-making in terms of relevant causal factors 
and ‘locally consistent’ frameworks of thinking, which means that a recipient of 
the explanation (perhaps also their subsequent self) can understand, verify, and 
possibly rectify the steps taken to reach conclusions. 

e Humans can be told what is desired directly as a goal rather than needing to 
iteratively try behavior in the hope of optimizing some sampled metric. 

e Humans can be told what is forbidden and/or constrained and they can avoid such 
situations without needing to physically interact with (i.e. ‘sample’) the environ- 
ment, assuming relevant grounded world knowledge. 
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e Humans can gracefully adjust their cognitive resource usage between perception, 
action, and learning rather than having rigid boundaries between them.* 

e Humans can operate in multi-agent settings, mostly through being able to effec- 
tively model other agents’ trajectories based on their perceived intentions and 
behavioral patterns. 

e Humans can do all of the above in the real world, perhaps with a curriculum, but 
not needing a high-fidelity resettable/reversible simulation within which to learn. 


For the purposes of this work, the above list of traits will be considered as a 
set of necessary emergent capabilities of general intelligence. With this motivation, 
we claim that a system which exhibits these traits must satisfy the requirements 
summarized in Fig. 1.1. 

While the human mind is the most immediate exemplar for general intelligence, 
we believe there are strong reasons to consider that the scientific method is better 
suited to provide a template for its implementation. As Rodney Brooks has famously 
observed [37], insights obtained via mental introspection might cause us to be deeply 
misled about the nature of intelligence. In contrast, the adoption of the scientific 
method yields falsifiable statements about the physical world. This can be seen as 
providing an ‘extended mind’ [50]—an externalized artifact with verifiable proper- 
ties that can directly inform the design of general intelligence architectures. Given the 
inevitable concerns about ‘AI alignment’,* such verifiability is of particular impor- 
tance in obtaining measures of safety. Hence, we believe that the path to general 
intelligence (at the very least, in a form capable of respecting safety concerns) lies in 
the attempt to automate the scientific method, from the perspective of an embodied 
reasoner with real-world concerns of deadlines and resource availability. 

Recent years have seen increasing emphasis on causality in machine learning. 
Causality is essential for building reasoning systems as it is a stronger criterion 
than merely statistical correlation. Originally having been convincingly argued for 
by Pearl [254], the relevance of causality to AI has been since agreed upon by 
Schélkopf, Bengio, and others. One of the key ideas is the ‘ladder of causality,’ 
which is framed as inference situated on three ‘rungs’: the observational, interven- 
tional, and counterfactual settings. Statistical learning from fixed datasets operates 
solely the observational rung: training on data generated only by an external pro- 
cess which the model does not affect. Interventions imply the ability to set values 
of certain variables despite the natural external processes in order to generate infor- 
mative data; for example, double-blind experiment design with control groups. The 
most demanding yet powerful application of causality is counterfactual reasoning, 
where inferences are drawn based on variable values which were never observed but 
generated through interventions in a model; for example, alternate history timelines. 

Pearl also introduced the Structural Causal Model (SCM), which is a directed 
acyclic graph structure specifically designed to enable users to operate on all three 
rungs of the ladder of causality. A key idea in the SCM formalism is that dependencies 


3 Whilst not necessarily under conscious control, nonetheless a property of human cognition overall. 
4 The quest for confidence that a general intelligence won’t attempt to turn everything into paperclips 


[30]. 
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between variables are framed as probabilistic functions rather than simply statistical 
dependence. This is better aligned with a physical interpretation of observations, 
namely that they are caused by physical processes over time. The distributions of a 
set of variables X; are given by the formula: 


Xi = fi(PA;, Ui), @=1,...,n) 


where PA; are the parent nodes of X;. The functions are probabilistic due to U;, 
which are exogenous noise variables which are jointly independent of one another. 
If there were dependencies, they could be explained by forming yet more causal 
relationships (as per the common cause principle), and so noise must be modeled as 
independent. 

Interventions in SCMs are defined as (temporarily) setting f; to be a constant. 
Importantly, the distributions of the parent nodes are unaffected by interventions on 
children since their relationships are effectively severed, which is different from stan- 
dard Bayesian networks. Instances of the latter only denote conditional independence 
relationships as undirected graphs, and so dependence between nodes persists even if 
the value of one is set. The ability to perform interventions also allows for principled 
counterfactual reasoning in SCMs. If we had observed some value for a node X;, we 
can use abduction to estimate the value of U;, and then after intervening on its par- 
ents PA;, re-apply the observed exogenous noise in order to produce a counterfactual 
inference. There are ubiquitous problems which require counterfactual reasoning that 
are consequently intractable for purely statistical models [254]. Given their apparent 
completeness for causal modeling, the modern problem of causal discovery involves 
deducing the topology of an SCM which can accurately describe the system. 

Despite widespread interest in the use of SCMs, it is vital to appreciate that, within 
scientific practice, causality is best understood as being only part of a contextualized 
process of situated, bidirectional inference. Hence we take the deeper view of science 
as the construction of statements which provide a concise and consistent description 
of possible worlds. As recently observed by David Deutsch: 


Finding causal theories is necessary but not sufficient. We need explanatory theories. 
“Mosquitos cause malaria” is essential and useful but only 1% of the way to understanding 
and curing malaria. 


The essence of the scientific method is, of course, the interleaving of problem 
formulation, hypotheses generation, experimentation, and analysis of results. Hence, 
a core aspect of the proposed approach is the requirement for a reflective expression 
language. Reflection is the property that statements can themselves be treated as 
data, meaning that hypotheses about knowledge in the language can be evaluated as 
first-class objects. This becomes salient for the process of hypotheses generation. 

Concretely, for our purposes hypotheses are some (sub)graph of inferences in the 
system’s transition model. Mappings from sensor inputs to effectors (or in the oppo- 
site direction, in the case of abductive inference) are just specific fragments of this 
overall model, starting or terminating in appropriately designated sensor or effector 
dimensions. Naturally, it is desired that the only hypotheses that are entertained by 
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the system are those which (1) actually describe a possible world and (2) are rele- 
vant to the task at hand. Considered from a ‘traditional symbolist’ perspective, the 
latter is of course equivalent to the well-known ‘Frame Problem’ [219], which, as 
discussed in subsequent chapters, is increasingly understood to have been an artefact 
of coarse-grained and disembodied inference. 

As we discuss in detail in Chaps. 7 and 9, it also follows that any reasonable 
candidate architecture for encoding knowledge for general intelligence will be com- 
positional, so as to obtain a semantics for compound hypotheses. Another essential 
property is the notion of strong typing, which enables inheritance/taxonomy and 
the explicit denotation of goals and constraints as regions of a (prospectively open- 
ended) state space. These properties jointly enable structured updates to working 
knowledge that retain the self-reinforcing nature of a scientific theory [89]. 

Finally, the modern scientific method requires that all working knowledge should 
be in principle falsifiable via empirical observation. Hence, we include as a require- 
ment to our approach that the base symbols of the expression language must include 
denotations which are grounded in this way. Causal modeling also stipulates the 
ability to intervene directly in the environment to learn the effects of one’s own 
agency. Thus, the system and representation language must both support primitives 
for interacting with the environment. 

We claim that it will not be possible (or economical) to automate human labor 
in the general case until AI also possesses these properties. As such, this is the 
context in which we will highlight the challenges and shortcomings of deep learning, 
reinforcement learning, and other existing AI approaches. To place contemporary 
approaches in the appropriate context, we proceed via a brief historical recapitulation 
of the rise and fall of the traditional symbolist approach. 


2.3 The Death of ‘Good Old-Fashioned AI’ 


The key figures at the inaugural AI conference at Dartmouth [220] were split across 
the nascent symbolist and connectionist divide, with Simon, Newell, and McCarthy 
in the former camp, and Shannon, Minsky, and Rochester in the latter [174]. How- 
ever, the symbolist approach became the prevailing one, not least because of the 
widespread confusion surrounding the solvability of the XOR problem by percep- 
trons [226]. The following decades saw concerted effort in symbolic AI. Many of the 
languages used to construct AI originated from the synthesis of procedural and log- 
ical programming styles, respectively exemplified by LISP and resolution theorem 
provers. Hewitt’s ‘PLANNER’ language [142] was a hybrid of sorts, being able to 
procedurally interpret logical sentences using both forward and backward chaining. 
It was used to construct SHRDLU [366] which was hailed as a major demonstra- 
tion of natural language understanding. This inspired other projects such as CYC 
[197], an ongoing attempt to create a comprehensive ontology and knowledge base 
that seeks to capture “common-sense knowledge’. Less ambitious and more success- 
ful were the various expert system projects that started in the 1960s and became 
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prevalent in the following two decades. These included the MYCIN expert system 
for diagnosing infectious disease [349], Dendral for identifying unknown organic 
molecules [38] and other well-known systems such Prospector [135]. Collectively, 
such systems became known as ‘Good Old-Fashioned AI’ (GOFAJ) [137]. 


Common Attributes and Roadblocks 


In general, GOFAI fits the template of a knowledge-based system. Such systems 
may be decomposed into two components: a knowledge base and an inference 
engine which answers queries and/or enhances the knowledge base by applying 
inference rules to the current state of the knowledge base. These systems typically 
required users to create the symbolic primitives and inference rules a priori. It gradu- 
ally became understood that such information was difficult to obtain—the so-called 
‘knowledge elicitation bottleneck’. 

Two other key longstanding GOFAI problems are the Qualification Problem and 
the Frame Problem [219], both of which contributed to the perception of scalability 
issues. The Qualification Problem is concerned with the preconditions needed for a 
logical deduction to be valid. The Frame Problem is concerned with the difficulty 
of fully specifying conditions related to invariants of state space transformation.’ 
Another key obstacle to scalability is the ‘cognitive cycle’ of the ‘sense—think—act’ 
paradigm [240], in which it is assumed that the world evolves in lockstep with the 
system. This quickly became a formidable obstacle for early projects such as the 
General Problem Solver [238] and PLANNER [142]. At that time, computation was 
also far more expensive than it now is—together, these two things meant that GOFAT 
was destined for a reckoning. 


The End of GOFAI 


(( The prospect of general intelligence using such rule-based systems was not highly 
appraised by observers. In the early 1970s, the Lighthill Report [202] lead to a drastic 
reduction in AI research funding by the UK government and DARPA cut funding 
to academic AI research in the USA. In the late 1980s, the United States’ Strategic 
Computing Initiative, which was essentially formed to participate in an AI/computing 
race with Japan, cut funding for new AI research as its leaders realized that the 
effort would not produce the full machine intelligence it desired. Simultaneously, 
the market for large-scale expert system hardware collapsed and more affordable 
general-purpose workstations took over. These developments formed part of what is 
now colloquially known as the ‘AI Winter’. 

In hindsight, many of these challenges and the accompanying demise of GOFAI 
could be said to be a function of the hardware of the time. Computing power and mem- 


5 Tt was eventually concluded that solutions exist to these problems, including default logics [347] 
and answer set programming [201]. 
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ory were obviously more expensive and software design decisions (such as deciding 
between the use of LISP or C/C++) had a correspondingly disproportionate impact 
on what could be computed in practice. A number of companies built upon LISP- 
based expert systems (such as Symbolics, LISP Machines Inc., Thinking Machines 
Corporation, and Lucid Inc.) went bankrupt. Ambitious undertakings were common, 
such as the Fifth-Generation Computer Systems (FGCS) project in Japan during the 
1980s. The massive investment in building highly parallel computing systems was 
ultimately in vain as simpler, more general architectures such as workstations from 
Sun Microsystems and Intel x86 machines became favored for such roles. Some of 
those forward-looking ideas have been reinvented in the early 21st century, such 
as the emphasis on highly parallel programming from FGCS now commonplace in 
general-purpose GPU programming with CUDA and OpenCL. 


The Problem of the ‘Sense—Think—Act’ Loop 


Regardless of technological advances, the GOFAI paradigm still does not present 
a viable path to general intelligence. For the architectures relying on ungrounded 
knowledge representation, there is no prospect of deploying them to address tasks in 
the real world of complex and noisy data streams. More fundamentally, the absence 
of grounding precludes the understanding of causal relationships of the real world—a 
core aspect of operationalizing the scientific method. Even if a GOFAI system were 
hypothetically to achieve symbol grounding, there would still be a fatal flaw: GOFAI 
never matured sufficiently to escape the scalability problem inherent in the ‘sense— 
think—act’ loop. As the system’s body of knowledge grows, the time required to make 
plans and predictions must also increase. As engineers put it, the system ‘lags behind 
the plant’ and faces two options: either to deliver correct action plans but too late, 
or to deliver on time plans that are incorrect [241]. This issue arises essentially from 
the synchronous coupling of the agent and its environment, i.e., the latter is expected 
to wait politely until the agent completes its deliberations. Technically speaking, 
synchronicity means that the agent computes in zero time from the environment’s 
perspective. 

Machine learning has failed to acknowledge the significance of this problem and 
has even adopted GOFAI’s synchronous coupling as one of the fundamentals of 
reinforcement learning; see Chap. 3. For now, computation is scaling at a rate that 
can sustain the synchronous abstractions used in large-scale projects (see Sect. 5.2). 
For lower-level routines (such as reactively handling sensory streams of data at a 
fixed frequency) this may suffice. On the other hand, there are certainly aspects to 
cognition which are slower, more deliberative and explicitly logical, and this is where 
the “sense-think—act’ approach breaks. Some believe that given more resources and 
innovations in model architectures, deep learning may be able to encode knowledge 
effectively enough to empower this sort of cognition and meet the requirements for 
general intelligence. In the next chapters, we shall see why this, in fact, cannot be 
the case. 
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Chapter 3 A) 
Where is My Mind? get 


It was like claiming that the first monkey that climbed a tree was 
making progress towards landing on the moon. 


Dreyfus, ‘A History of First Step Fallacies’ [73] 


The research field of AI is concerned with devising theories, methods, and workflows 
for producing software artifacts which behave as intelligent subjects. Evidently, intel- 
ligence, as the property of an agent, is not of necessity inherited from the methods 
used to construct it: that a car has been assembled by robots does not make it a robot. 

Unfortunately, even this obvious distinction can sometimes be erased in some 
prominent published work. To wit: the statement, “an agent that performs sufficiently 
well on a sufficiently wide range of tasks is classified as intelligent” was recently 
published by DeepMind [273] to give context to a paper claiming to have developed 
“the first deep RL agent that outperforms the standard human benchmark on all 57 
Atari games” [14]. This invites the inference that the range of the tasks (57 games) 
that have been achieved warrants calling the advertised agent ‘intelligent’. However, 
careful reading of the paper reveals that the authors have in fact developed 57 different 
agents. Granted, this was achieved using the same development method and system 
architecture, but 57 agents were nonetheless trained, rather than the claimed single 
agent. Here is a prime example of distilled confusion: a property (applicability to 57 
tasks) of one construction method (instantiating the Agent57 system architecture) 
has just been ‘magically’ transferred to some 57 artifacts produced by the method. 

This only fuels what Marcus terms the “epidemic of AI misinformation” [211] 
from which the world has been suffering for some years [165]. As a minimum, this 
damages public understanding (impinging on business expectations/validation), trust 
(in scientific deontology), and education at large. Indeed, in common with others [75] 
we are of the opinion that such ‘misinformation at scale’ steers both governance and 
research the wrong way: it gives credence even among some seasoned researchers— 
© The Author(s) 2022 17 
J. Swan et al., The Road to General Intelligence, 


Studies in Computational Intelligence 1049, 
https://doi.org/10.1007/978-3-03 1-08020-3_3 


18 3 Where is My Mind? 


or worse, the next generation of researchers—to claims that machine learning is the 
(only) root of general intelligence, a myth we debunk in the next two chapters. But 
first, to give the matter a proper grounding, we must return to the roots of ML in order 
to objectively assess its domain of application, appreciate its evident achievements, 
and delineate the boundaries of its potential. 


3.1 A Sanity Check 


With the demise of GOFAI came a renewed effort to explore connectionist learning 
paradigms. Over time, the field shifted from the symbolic languages of GOFAI 
to ‘end-to-end’ feature learning. This has loosened the constraints on knowledge 
representation, namely moving away from an expression language with discrete 
symbols to something more representationally amorphous. Learning the parameters 
of a function approximator then becomes a fitting and regularization problem, as 
conceptualized by the bias-variance trade-off. 

Inspired by trial-and-error learning in animals, reinforcement learning (RL) devel- 
oped from work in optimal control, which is concerned with minimizing some metric 
of a dynamical system over time. The techniques developed for solving these prob- 
lems became known as dynamic programming, and also produced the formalism of 
the Markov Decision Process (MDP), which is now essential to RL [331]. It provides 
the abstractions of states s € S, actions a € A, and reward functionr: S x A —> R. 
The evolution of states is assumed to progress in discrete steps according to the tran- 
sition function P: S x A — A(S), with a scalar reward R, = r (s+, at) produced 
at each timestep t. These ingredients can be modified to accommodate additional 
complexity such as partial observability, continuous spaces, and multiple agents. 

The purpose of RL is to optimize a value function based on sampled rewards 
which are stationary and specified a priori, in order to produce an agent (called a 
controller) consisting essentially of a policy that maps states to actions. It does so by 
shaping the policy encoded in a neural network! generally either via gradient descent 
over its weights or by ‘neuro-evolution’, i.e., optimizing the fitness of a policy within 
a population thereof. Training is unsupervised and uses for its ground truth a cor- 
pus of trials and errors logged from an number of sessions of interaction (episodes) 
with a simulated world—the number of episodes grows with the complexity of the 
task/environment distribution and is generally enormous (in other words, the sam- 
ple efficiency is inordinately low). Note that, although it is possible in principle to 
perform trials and errors in the real world instead of in a simulator, this is rarely the 
case in practice, for fear of wear and tear of equipment and safety risks (on sample 
inefficiency and safety concerns, see Sect. 5.2). 


' In early days, policies were encoded as mere lookup tables (as in Q-learning [208]), but this could 
not scale with the increasing dimensionality of the problems, hence the need to compress said 
policies via deep neural networks, hence ‘deep RL’. 
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1: while True do 
2: te 0; e, False 

3: Reset environment to get 0, 

4: Observe 0% 

5: while not e, do 

6 a,— T(0;) 

7 Take action a, ; obtain step 1, = (Fp On1, €r) 
8: Observe (a, n) 

Environment 9: Update the policy 77 

10: t—t+1 

11: end while 

12: end while 


Fig. 3.1 State-of-the-art high-level RL training procedure (Hoffman et al. [143]). The fact that, in 
practice, the procedure is implemented in sophisticated ways does not change its fundamentals 


As illustrated in Fig. 3.1, the training process is synchronously coupled with 
the environment. The procedure it implements consists of successive sessions of 
interaction, each composed of a number of cycles: a cycle starts by putting the 
simulator on pause, invoking a subprocedure for guessing the next action based on 
the previous world state, and feeding the action to the simulator which responds 
immediately with a reward—the collected rewards are ultimately used to adjust the 
policy. The simulator is then resumed, it updates its state and a new cycle starts. 
At some point, another subprocedure decides to stop the cycle and a new session is 
initiated by resetting the simulator. When the controller performs well enough over 
selected samples of the training distribution, the procedure halts—note that the test 
distribution is required to be the same as the training one. 

This is the basic procedure. Depending on the needs, the procedure is generally 
augmented with various support systems such as short-term memory, episodic mem- 
ory, improved guessing subprocedures (such as intrinsic motivation heuristics and 
meta-learning, i.e., in-training adjustment of the guessing subprocedure), world mod- 
els (model-based RL), hierarchies of policies (hierarchical RL), and so on. Regardless 
of its sophistication, the procedure is never learned nor performed by the controller 
itself: it is instead designed and performed (using a dedicated tool chain) by human 
workers. Once deployed in production, the controller matches, repeatedly, the same 
learned policy against sensory inputs and directs the resulting actions to its actuators 
(‘inferencing’ in ML parlance). In the main, such ‘inferencing’ merely amounts to 
applying the neural network encoding the policy to an input vector thus producing an 
output vector, i.e., a single mathematical operation. Regardless of how this operation 
is implemented, the controller is purely reactive (one could say ‘blind’) and does not 
make any situated deliberation worthy of the name, let alone taking any initiative. 
As Hervé Bourlard (a prominent ML researcher, head of the Swiss Idiap Research 
Institute) recognized recently in Forbes, “Artificial intelligence has no intelligence” 
[44]. 
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In technical terms, a typical RL policy is a ‘curried planner’, where it is curried” 
with respect to a predefined goal/environment coupling. This stands obviously at the 
antipodes of what one expects from a system that learns to respond to arbitrary orders 
of its user in a world over which system designers have little or no control. Such a 
discrepancy is not contingent: it bears witness to the orthogonality of the respective 
ambitions of RL and general intelligence. 

The purpose of RL is essentially to automate the production of software dedicated 
to handling a specific task in a specific environment, both defined a priori. From an 
engineering perspective, RL amounts to a compilation procedure. Granted, it com- 
piles interaction logs instead of, say, C++ code, but compilation it does nonetheless 
(‘compression’ being the preferred nomenclature in ML). Assuming the resources 
required to build a fast? simulator are available (domain knowledge, funds, etc.), 
if one can guarantee that the world and the task will forever remain as they were 
during training,* then RL constitutes a valid cost-efficient alternative to standard 
engineering procedures for some classes of problems. 


3.2 Real-World Machine Learning 


ML has been successfully applied to automate tasks of increasing apparent com- 
plexity such as playing video games, albeit demonstrating very little with regards to 
real-world applications where much higher levels of complexity are the norm. 

In contrast, the much less publicized application of ML to industrial automa- 
tion is a far better and more rigorous exemplar of the value of ML in generating 
fast and accurate controllers for complex machinery in unforgiving environments. 
Examples abound: improving cooling in data centers, optimizing network routing for 
high-performance computing, optimizing chip layout, predictive maintenance, robot 
manipulation, digital twinning, etc. In this significant business-relevant and con- 
strained domain, ML enjoys steady progress towards broader applicability—again, 
the fundamental principles are not challenged, nor do they need to be for the time 
being. In particular, motivated by industrial requirements, it is of critical importance 
to keep policy-defined behaviors within prescribed operational envelopes and this is 
an area of ongoing research [48, 100, 231, 270, 275, 346]. 

To re-emphasize: according to the expectations of the ML method, models (‘poli- 
cies’ in case of RL) are produced in the lab according to client-provided specifica- 
tions, then these models are frozen to constitute the core of deployed controllers. 
When the environment or task changes or whenever there is a need for improvement, 
new models are trained in the manufacturer’s lab to replace the old ones, following 


? https://en.wikipedia.org/wiki/Currying. 
3 Because the RL procedure is synchronous and highly sample-inefficient, simulators must be fast 
enough to keep the training time under manageable limits. 


4 The task is fixed and the deployment environment is drawn from the same distribution as the one 
used for training. 


3.2 Real-World Machine Learning 21 


the standard procedure of software update. This is unsurprising: like any other soft- 
ware engineering method, ML addresses predefined controlled conditions—a claim 
of ‘intelligence’ is neither made or required; client-side engineers know what to 
expect. 

Note that, in compliance with the continuous integration/continuous delivery 
paradigm (CI/CD) that pervades modern software engineering (for better or worse), 
it is possible to accelerate the rate of the updates. In principle, this is achieved by 
expanding the training distributions via the parallel aggregation of multiple data 
sources (as long as they are deemed, by humans in the loop, to pertain to one same 
task/environment), then training new policies in the background and updating the 
deployed controllers asynchronously. 

Beyond direct application in standalone deployments, ML-generated controllers 
are also used as components of larger systems. For example, some of the building 
blocks of the Multi-level Darwinist Brain architecture (MDB) [20] consist of policies 
operating on world models. Reminiscent of hierarchical RL, the selection of a policy 
among the repertoire is itself a policy, subjected to a satisfaction model (intrinsic moti- 
vation). World models and the satisfaction model are initially given by the designers, 
with some guarantees that they are general enough to support a determined set of 
tasks and environmental conditions. Learning of policies and models is performed 
offline via neuro-evolution, subject to hand-crafted value functions. In that sense, 
an MDB controller still depends critically on the foresight of its designers, which 
limits its potential for autonomy. Yet it can be altered after deployment (incurring 
some inevitable downtime) in an incremental fashion at a cost arguably lower than 
that of re-engineering from scratch. This line of work culminates for example in the 
DREAM architecture [68]. Whereas MDB representations are tuned, ad hoc, to indi- 
vidual control policies, DREAM introduces a generic slow refactoring loop (called 
‘re-representation’) that continually extracts common denominators from existing 
representations in order to form more abstract ones to eventually facilitate transfer 
learning, in support of a user-defined training curriculum. 

There is no contention about the value of ML as long as claims remain aligned 
with established principles and proven capabilities. As we have seen, whereas an 
agent is learned, the deployed agent does not itself learn or act autonomously. This 
is fully consistent with both the principles of ML and its goals and makes perfect 
sense from an engineering/economic point of view. What is obviously more debatable 
is the claim that a process designed for manufacturing purely reactive software will 
eventually produce thinking machines. The next two chapters will question, in depth, 
the plausibility of such a prophecy. 
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Chapter 4 ®) 
Challenges for Deep Learning creche 


Are you a good noticer? Do you notice things well? I mean, for 
instance, supposing you saw two cock-starlings on an apple-tree, 
and you only took one good look at them — would you be able to 
tell one from the other if you saw them again the next day? 


Hugh Lofting, ‘The Voyages of Doctor Dolittle’ [205] 


Deep learning (DL) has emerged as the dominant branch of machine learning, becom- 
ing the state of the art for machine intelligence in various domains. As discussed in 
the previous chapter, this has led some researchers to believe that deep learning 
could hypothetically scale to achieve general intelligence. However, there is increas- 
ing consensus (e.g. [57, 210, 230]) that the techniques do not scale as well as was 
anticipated to harder problems. 

In particular, deep learning methods find their strength in automatically synthe- 
sizing distributed quantitative features from data. These features are useful insofar 
as they enable mostly reliable classification and regression, and in some limited 
cases also few- or zero-shot transfer to related tasks. However, it is increasingly 
questionable whether deep learning methods are appropriate for autonomous roles 
in environments that are not strongly constrained. While there are still countless use- 
cases for narrow artificial intelligence, many of the truly transformative use-cases 
can only be realized by general intelligence. 

We recall from Sect. 2.2 that, while we do not know the internal mechanisms 
of human general intelligence, we observe that “science as extended mind’ is a 
pragmatic description of a general intelligence model of the environment. However, 
neural network representations are not readily interpretable, either to humans or more 
importantly—as we subsequently argue at length—to the learning process itself. 

This chapter has two purposes: the first is to explain what properties are wanting 
(their relationship to the entire book is shown in Fig. 4.1) and to elicit fundamental 
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Fig. 4.1 In this chapter we argue that the lack of structure in representation languages created via 
deep learning are in conflict with the requirements of general intelligence 


obstacles posed by deep learning. The second purpose is to argue that ‘science as 
extended mind‘ offers a more effective perspective for designing the desired system. 
The latter is further developed in Sect. 7.2 and is the foundation of our proposed 
inference mechanisms in Chap. 9. The following claims contrast deep learning with 
the requirements for operationalization of the scientific method: 


e Representations are not compositional, which make them inefficient for modeling 
long-tailed distributions or hierarchical knowledge. 

e Representations are not strongly typed, which prevents verification against adver- 
sarial scenarios and hinders generalization to new domains. 

e Representations are generated by models which do not support reflection, which 
restricts model improvement to gradient-based methods. 


4.1 Compositionality 


There has recently been increasing emphasis on the importance of compositionality 
[120] for machine learning. To take a famous example from AI history [149, 304], 
humans do not require an a priori hypothesis to react to the outlier case of ‘goat enters 
restaurant’. Knowledge about goats can be freely composed with knowledge about 
restaurants; for example, the sudden arrival of a goat would not generally be expected 
to preserve a tranquil dining atmosphere or good standards of hygiene [281]. It has 
recently been stated by some of the most renowned exponents of deep learning [21] 
that: 


We believe that deep networks excel because they exploit a particular form of composition- 
ality in which features in one layer are combined in many different ways to create more 
abstract features in the next layer. 
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However, this notion is far weaker than is actually required, in particular for 
purposes of AI safety but (as we subsequently argue) also for scalable inference 
via greater sample efficiency. The weakness of this notion of compositionality is 
evidenced by numerous challenges for deep learning (discussed in more detail in 
this and subsequent chapters): 


e Adversarial examples. 

e Weak generalization capability. 

e Inability to explicitly induce or propagate more than a small number of types of 
invariant (translation, rotation, etc.). 


Indeed, it could be said that DL is closer to the merely syntactic notion of compos- 
ability than the semantic notion of compositionality. In its most degenerate form, the 
syntactic notion is merely the observation that feature ensembles are instances of the 
‘composite’ design pattern [101, 369] and hence hierarchically aggregated features 
are syntactically substitutable for individual ones. However, that does not impose any 
intrinsic constraints on what the features represent or what the ensembles compute. 
In contrast, compositionality is defined as [207]: 


The algebraic capacity to understand and produce novel combinations from known compo- 
nents. 


The term ‘algebraic’ here effectively means ‘having well-defined semantics’, in the 
sense that the behaviour of a composite exhibits constraints that are a function of 
those of its component parts. The role played by the alleged compositionality of 
DL is lacking in almost every respect of this definition: in algebraic terminology, the 
feature representations in DL layers can be ‘freely composed’. In contrast, in Chap. 9 
we describe a mechanism for imposing a denotational semantics on composite rep- 
resentations. 

Hence, the only property in ML for which there is a guarantee of generalized, end- 
to-end compositionality is differentiability [90]. If, as seems likely, it is necessary to 
express more directly whether or not some desired property is compositional, then 
this requires extending DL far beyond ‘differentiable programming’. In common 
practice, composability in DL consists of assembling a network from constituent 
parts which may be trained ‘end-to-end’. Usually, this follows the encoder-decoder 
pattern where the encoder is responsible for generating vectorized features, and the 
decoder maps the features for classification or regression. This paradigm is common 
in deep learning applied to sequential data or labels. Example domains are text- 
to-text [66, 276, 330, 370], image-to-text (and vice-versa) [169, 173, 283], and 
program synthesis [46, 123, 124, 284]. When explicitly tasked with the generation of 
compositional representations, neural networks have been observed to exhibit better 
generalization performance [5]. However, as observed throughout the literature (e.g. 
Liska et al. [204]), more complex architectures tend not to scale well, showing limited 
scope. We argue the following as the most salient consequence: 
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Claim 1: DL and Compositionality 


Deep learning appears to be fundamentally limited in its ability to create com- 
positional knowledge representations. This severely inhibits the effective for- 
mation and use of structured, hierarchical knowledge, which in turn results in 
weak performance in domains with long-tailed distributions. 


Hierarchical Behavior and Representations 

In advance of more detailed discussion in the following chapter, we briefly consider 
here some work from deep reinforcement learning (DRL), so-called for its use of 
DL for knowledge representation. Deep reinforcement learning has seen much work 
on encapsulating learned behaviors into skills [94, 199, 222, 312] and options [13] 
to be composed in a hierarchical fashion. Researchers are motivated by the potential 
for hierarchy to reduce planning horizons, branching factors and improve sample 
efficiency. Despite interesting results that point to progress in these directions (e.g. 
[170]), it is not clear whether these approaches scale to more difficult problems. 
Nachum et al. [235] study these methods in particular and find that the benefits 
of hierarchical policy composition have more to do with exploration than with the 
imposed structure, and that the same benefits can be obtained with a modified explo- 
ration technique and a ‘flat’ policy. 

Other work [65] explores embeddings for tasks which can be composed arith- 
metically, in a similar manner to deep word embeddings [223]. However, subsequent 
work on sentence and document embeddings [58, 125, 256] suggests arithmetic com- 
positionality of properties encoded via embeddings is a difficult constraint and that 
little besides differentiability is scalable for compositional representations. In certain 
settings, recursion can effectively be used to hierarchically compose the interfaces of 
deep learning architectures [41, 244]. However, this composition is still at a coarse 
granularity, and it appears unlikely that arbitrary properties can be composed by this 
means. As such, it cannot be said that DL gives scalable solutions for building hier- 
archical knowledge, and this will most certainly limit DL’s overall scalability toward 
general intelligence. 


Robustness in Long-Tailed Distributions 

It is important to note that practically all domains of interest contain long-tailed dis- 
tributions, particularly if they are grounded in the real world. Indeed, it can be argued 
that if long tails are not encountered in the data, then that represents a limitation of the 
dataset and subsequently the evaluation of the model. For example, an autonomous 
vehicle in an unconstrained environment will need to deal with an endless Borgesian 
list of edge cases [274]: 


e Telling the difference between a shallow puddle and an impassable flood. 

e Obeying signage written in human language for an intelligent human being to 
understand. 

e Figuring out what another driver means when they flash their headlights. 

e Badly scuffed or missing road/lane markings. 
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e Pulling on to a kerb or through a red light to allow an emergency vehicle to pass. 

e Correctly determining from context whether a traffic light is knocked askew or 
genuinely pointed at a different lane of traffic from this one. 

e ...and so on, ad infinitum. 


It has been observed that symbolic representations are well-suited for long-tailed 
distributions because of the potential to map recursive expression syntax into complex 
semantics [214]. In contrast, the issue of long-tailed distributions is not sufficiently 
emphasized in current deep learning research. This is evident in both the confidence 
and emerging scrutiny of natural language processing models. Despite more com- 
prehensive benchmarks such as GLUE [354], the combinatorial nature of natural 
language expressions acts in direct opposition to the notion that a ‘representative’ 
training corpus can be reasonably sized. When highly-parameterized models such 
as GPT-2 [276] are thoroughly analyzed [206, 212], they reveal an understanding 
merely on the level of association, far from the depth required for anything like 
human-level understanding of what has been parsed. This problem has also been 
repeatedly identified in neural program synthesis, where program induction should 
be robust to all unseen inputs. For example, the Neural GPU [163] was trained to 
do long addition and multiplication. While results suggested robustness for prob- 
lems into hundreds of digits in size, further work [272] revealed weaknesses when 
doing arithmetic involving many consecutive carry operations. Many other exam- 
ples emerged concurrently of neural networks attempting to do program induction 
for arithmetic [183, 284, 303, 373] which also had a pattern of being unable to suc- 
ceed on outlier cases. These examples clearly show that, even in domains which can 
be formally characterized, deep learning in its current form will not be of much use 
in the many cases where crucial inputs come from long-tailed distributions. 


4.2 Strong Typing 


In Chap. 2, we introduced two concepts claimed essential to general intelligence: 
‘work on command’ and ‘science as extended mind’. In the previous section we 
argued for the necessity of compositionality. The primary motivation for that argu- 
ment is the need for scalability and robustness in the presence of long-tailed dis- 
tributions. In this section, we argue another claim regarding representations, stated 
below. 


Claim 2: Deep Learning and Types 


Deep learning is not designed for generating typed representations. This defi- 
ciency is prohibitive for developing general intelligence, since strong typing is 
essential for invariant propagation, inheritance, verification, and rapid adapta- 
tion of existing inferences to new observations. 
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Types can be used to explicitly delineate subregions of a state space, which is 
important for specifying constraints and objectives given to an agent as well as 
the hypotheses constructed by an agent for explaining causal mechanisms. Deep 
learning essentially concerns itself with only a single type — that of numeric vectors, 
even for the incredibly large models which are increasingly used. The meanings of 
intermediate representations remain opaque and, we argue, underconstrained. For 
example, the statistical and observational nature of supervised learning means that 
training and test error can converge favorably without the constraint that intermediate 
representations capture causal relationships. This observation has raised widespread 
concern for the biases that may be present in deployed models which are used in 
sensitive situations such as loan approvals and prison sentencing [132, 359]. 


Adversarial Examples and i.i.d. Assumptions 

Instead of type constraints, deep learning is built upon assumptions about the dis- 
tributions of data to which it is applied. Most consequential is the requirement that 
training and test data are both independently and identically distributed (i.i.d.). This 
condition is essential for the strong convergence guarantees which are derived in sta- 
tistical learning theory. In that context, such assumptions are perfectly reasonable, 
but are ill-matched for general knowledge representation and learning. The clearest 
indication of this is the existence of adversarial examples. 

The most common formulation of adversarial examples are minutely perturbed 
inputs specifically designed to severely reduce supervised accuracy on deep learning 
models [121, 335]. Adversarial training has been developed in response, but new 
weaknesses have emerged [345] and an all-encompassing solution remains elusive. 
Meanwhile, this vulnerability has been confirmed in various scenarios related to 
image classification in the real world [184, 341] and variations applicable to rein- 
forcement learning agents [112, 151] and natural language models [2, 158, 233, 
249]. 

Adversarial examples are intriguing to humans since our perceptual systems are 
much more robust to such attacks. We argue that typed representations would prevent 
such catastrophic misrecognitions that have no clearly explainable origin in terms 
of DL parameter weightings. Objects can essentially be conceptualized as a set of 
invariances, such as shape being invariant to the brightness of shone light, texture 
being invariant to orientation, etc. These invariances anchor qualitative descriptions 
and allow us to construct part-whole and inheritance relationships through deduction 
[332]. 

Importantly, we cannot entirely eliminate high-dimensional inputs, since ground- 
ing is essential for a science-oriented agent. Instead, we argue that inducing types 
from raw high-dimensional data should be prioritized to occur at the lowest possible 
hierarchical level, since any higher level inference would benefit from the stabil- 
ity and clarity of typed language elements. We contrast this proposal with current 
techniques still bound to the i.i.d. paradigm, such as domain randomization [343]. 
While this has shown positive results in complex scenarios [31, 247, 255], there 
are currently no compelling reasons to believe that the method is scalable to lev- 
els required by general intelligence, providing only a relatively crude way to learn 
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invariances which does not involve the distillation of new types. This of course is an 
open challenge and we discuss it further in Chap. 9. 


Out-of-Domain Generalization and Meta-learning 

An essential trait for general intelligence is the ability to efficiently leverage learned 
knowledge when facing a novel yet related domain. Existing literature describes 
techniques for domain adaptation, where a model can perform in another domain 
with few or no labeled examples. Domain-invariant feature learning [102, 232] and 
adversarial training methods [348, 352] have shown positive results for deep net- 
works. Transfer learning and semi-supervised learning for deep neural networks are 
also well-studied topics. 

Nonetheless, there is consensus that there remains much to be desired from deep 
learning in this regard. We argue that typed representations are the natural way to 
address this requirement for general intelligence. Humans are naturally capable of 
seeing new situations as modified versions of previous experience. In other words, 
there is an abstract type of which both the prior observation and the current stimuli 
are examples, but with certain attributes differing. Given enough new observations, 
it may be appropriate to reify a different type altogether. Rapid domain adaptation 
can also be modeled as a scientific exercise of determining an unknown type with 
minimal experimentation. We expand on this perspective in Sect. 7.2. 

Meta-learning has emerged as a popular research topic aiming to expand the 
generalization of deep learning systems [86, 200, 227, 239, 286]. These methods 
train models to a location in parameter space which allows for efficient adaptation 
to unseen tasks as opposed to unseen data points. Conceptually, this may appear to 
expand generalization capacity. However, the framework makes assumptions about 
tasks coming from the same distribution, much as for individual data points in a 
dataset. As such, it suffers similar issues, such as inflexibility to non-stationarity 
in tasks. More related to generalization, meta-learning does not yield transferable 
abstractions, rather it gives an optimized starting point for creating adaptable models. 
As argued by Chao et al. [45], meta-learning is not fundamentally all that different 
from supervised learning. This makes it unlikely to truly resolve the challenges of 
generalization when the scope or nature of tasks are broadened. 


4.3 Reflection 


In the previous two sections, we made the case for compositionality and strong 
typing as necessary properties for representing knowledge for general intelligence. 
This section is concerned with what is needed in order to adapt that knowledge to 
make it more accurate and comprehensive. In deep learning, this process is handled as 
an optimization problem with the target being minimal error, which fits neatly with a 
purely numerical class of models. The incorporation of symbols via typed expressions 
complicates this but also offers new opportunities, which we now discuss. 
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The notion of scalability is applied in various ways. In this section we will draw 
attention to scalability in terms of sample efficiency with respect to training data. 
We expect that as an agent grows more intelligent, it should be able to evaluate and 
compare increasingly complex models with roughly the same efficiency as when 
it was less developed and learning about simpler phenomena. This is one of the 
merits of the scientific method: given two competing theories for physical reality, 
e.g. Newtonian mechanics and Einsteinian relativity, a single experiment (indeed, 
even a ‘thought experiment’) may suffice to decisively favor one model over the 
other.! 

In this section we first characterize what learning looks like for deep neural net- 
works and consider the choices that researchers make in order to support this sort 
of learning. Ultimately, we find that the resulting formulation is not well suited for 
kind of rapid, scalable learning that general intelligence requires and present the 
corresponding claim below: 


Claim 3: DL and Reflection 


Lacking compositionality and strong typing, deep learning also cannot support 
meaningful reflection over proposed models of its target domain. The property 
of reflection allows for direct, structured updates to knowledge, which compares 
favorably to deep learning’s exponentially growing requirements for data and 
an undesirable dependence on end-to-end model design choices. 


Knowledge and Optimization in Neural Networks 

Much effort has been directed toward developing networks and optimization pro- 
cedures which result in reliable training. This reasoning has paid off since deep 
neural networks are universal function approximators: a class of models that can 
approximate any continuous function to arbitrary precision [56, 198]. Regardless, 
training remains nontrivial because of the high-dimensional non-convex optimization 
involved. Consequently, neural networks continue to get larger at a rapid pace (e.g. 
[313]), often resulting in dramatic overparameterization. Work shows models are 
actually able to fit to almost any set of input-output pairs [203] including completely 
randomized ones [374]. Other interesting examples include the ability to compress 
networks while retaining accuracy [155] as well as the ‘lottery ticket hypothesis’ 
[93] which states that large networks regularly have sparse subnetworks of only one 
tenth the size, which themselves can achieve equal test accuracy. Analysis of neural 
networks from an information-theoretic perspective [302] shows that generalization 
follows from a great deal of internal data compression, which is also consistent with 
the notion that networks are larger than they need to be. 

Gradient-based optimization can be considered a very simple form of reflection, 
wherein credit is assigned to individual parameters with respect to the error. For 
models using typed and compositional features, changes require maintenance of 
the associated semantics. Whereas the knowledge of deep learning models is tuned 


' When contextualized appropriately by scale, of course. 
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in subtle and unpredictable ways, the ability to reflect on representations provides 
a basis for ensuring semantic consistency. We argue this stability during learning 
and amenability to direct updates are necessary properties of learning in general 
intelligence. 

Furthermore, consider that deep learning is designed to result in convergence of 
the values of parameters. After convergence, adaptation to new tasks or modifying 
the knowledge is difficult and/or costly. Hence, deep learning with gradient descent 
does not adequately account for the necessity of open-ended learning for general 
intelligence. A more reflective learning approach is necessary to re-calibrate existing 
representations and render them actionable with respect to new goals and constraints. 
After some exposure to the new domain, a reflective approach can consolidate two 
models which share underlying similarities or abstractions back into a single, self- 
consistent and more general model. We explore more details and examples of these 
propositions in Chap. 9. 


4.4 Implications and Summary 


We reiterate that Claim | given at the beginning of this chapter does not detract from 
the merits of deep learning. To date, these methods have far outshone their prede- 
cessors in their ability to learn features from observational data and make valuable 
predictions from them. Most standard neural network architectures are also very 
amenable to parallelization and acceleration, making them practical for their cur- 
rent use cases. Hence, for some, the preceding discussion may not provide sufficient 
impetus to look beyond deep learning. For many practical applications in narrow 
domains, it suffices to have training methods which are applicable in the presence 
of relatively massive computational resources. In theory, recent work [81, 99, 236, 
278, 309, 375] could yield representations which are causally disentangled and enjoy 
greater compositionality, but it seems likely that even the exponents of such forms 
of causal representation learning would admit that much progress is yet to be made. 

We therefore recall the motivation which opened this chapter: the challenges for 
deep learning we have discussed arise in the context that the paradigm was not 
designed with general intelligence in mind. We can tie the challenges of machine 
learning together as being the symptoms of a ‘narrow framing of the problem’. The 
most salient part of the framing is that deep learning model parameters are intended 
to converge to some satisfactory optimum given the dataset and iterative learning 
procedure. Training a model with a priori knowledge of the desired outcome is 
fundamentally at odds with the notion of open-ended learning [340], an essential 
part of general intelligence. 

It should be emphasized that, even for the task of constructing general intelli- 
gence, we do believe that deep learning may be the most sensible and practical way 
to implement very basic layers of perception on high-dimensional sensory inputs 
such as visual and audio feeds. Although we have emphasized the importance of 
compositionality and strong typing, we also acknowledge that they may not always 
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be applicable at the level of individual pixels or waveform amplitudes. Instead, it 
should be clear that compositionality and strong typing become increasingly rele- 
vant when the subjects being represented can usefully be compressed into ‘concepts’ 
or ‘expressions’ rather than mere sensory samples. 
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Chapter 5 A) 
Challenges for Reinforcement Learning ciecie; 


Theories destroy facts. 


Peter Medawar [221] 


Reinforcement learning was historically established as a descriptive model of learn- 
ing in animals [234], [324], [32], [279] then recast as a framework for optimal con- 
trol [331]. The definition of RL has been progressively expanded to be so broad and 
unconstrained [143] that virtually any form of learning can ultimately be described 
as RL. This has led some to consider RL as a plausible candidate for unifying cogni- 
tion at large, hence subsuming general intelligence. In this chapter and the next, we 
identify fundamental issues that challenge this view. In summary: 


e The notion of a fixed a priori reward function acts counter to open-endedness: 
assumptions of stationarity and the validity of ‘once and for all’ behavioral speci- 
fications are simply not adequate for open-ended behavior in the real world. 

e For both efficiency and safety reasons, the notion of reward sampling prevents RL 
from being performed in the real world. 

e The notion of policy conflates knowledge (world models) and motivation (goals)! 
via the direct mapping from states to actions: this opposes continual world mod- 
eling and plan composition. 


Toits credit, function approximation using ‘deep’ neural architectures has advanced 
the capacity of RL substantially beyond the capabilities of the original tabular set- 
ting. However, as described in Chap.4, deep function approximation has serious 
limitations. 


' Even though model-based RL leverages world models during learning, its end-product still remains 
a policy. 
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Fig. 5.1 In this chapter we argue that the centrality in RL of a priori fixed reward functions oppose 
key requirements for general intelligence 


Claim 4 


Reinforcement learning techniques which use deep neural networks for function 
approximation will inherit the issues stated in Chap. 4. 


As per Sect. 2.1, we center our requirements around the continual adaptation of a 
world model to achieve a variety of arbitrary goals. We also stipulate that (1) learning 
without forgetting should be relatively cheap and (2) safety must be certifiable. 


5.1 A Priori Reward Specification 


Arguably one the most central concept in RL is the value function which usually 
express the value of a state or state-action pair in terms of the expected returns 
under some policy starting from that state—recall from Section 3.1 that a policy is 
simply a distribution of actions conditioned on states. The iteration of value functions 
toward a fixed point makes sense because the reward in RL is specified beforehand 
and is assumed to remain stationary. If we entirely commit to this strategy, then the 
only way to become more general is to widen stationarity over broader notions of 
tasks and environments. Learning is then extended to maximize reward over an entire 
distribution of MDPs [293], [305], [277] which share a state and action space whereas 
the reward function is selected from a distribution. These modifications, which are 
examples of meta-RL, purportedly allow for an agent to learn how to succeed over a 
variety of tasks, and would seem to be the most developed route in fully stationary 
RL towards general intelligence. 

Notwithstanding those extensions, the value function is still an ideal place to 
scrutinize the foundations of RL. The assumptions of RL make it suitable to apply to 
problems where there is a naturally occurring quantitative objective to be maximized 
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and there are ‘operationalized’ actions [11], as exemplified by e.g. video games, 
where even random behavior has non-negligible correlation with success. Even then, 
the RL engineer may have to do significant work to make a dense, shaped reward 
from a sparse reward, in order to obtain a sufficiently rich reward signal. This is not 
all that different from having to invest effort into designing the architecture of neural 
networks to accommodate inductive biases and ease nonconvex optimization. In the 
context of general intelligence, we therefore present the following claim: 


Claim 5 


The notion of a fixed a priori reward function acts counter to open-endedness: 
assumptions of stationarity and sufficient human foresight cannot be guaranteed 
to hold in the domains where intelligent agents are expected to operate. 


We argue that since open-ended learning is a necessary trait of general intelligence, 
there cannot be any true fixed-point optimality condition to achieve. It might be 
possible that certain primitives can be hard-coded as policies over state observations, 
such as for basic locomotion. However, it is at the very least counter-intuitive that 
the higher-order cognition for achieving a variety of goals might be expressible as an 
optimization over a very wide distribution. Instead, we must dismantle the assumption 
that rewards are stationary and that states (or state-action pairs) accordingly have a 
pre-defined value. 

Indeed, the literature on prospects for continual learning in RL highlights this same 
point [171]. In this context, general intelligence may be approached as mastering 
a sequence of MDPs rather than a distribution. Within each one, there is still an 
assumption of a priori optimality, and this may perhaps make sense for individual 
tasks which are suitably framed as optimization problems. The more general scenario, 
however, involves tasks which the user himself may wish to modify/extend/retract 
as time goes on. Naturally, a framework for task specification should support the 
modification of a specification without requiring that it be treated as an entirely 
new one. This could be partially enabled by having reward functions which are 
compositional, which is explored to a degree in literature on general value functions 
and scalarized multi-objective RL [350]. 

This challenge of modifying a priori reward specifications takes on even greater 
importance in AI safety literature through the core issues of misspecification and 
alignment [4], [82], [128], [292]. Even if we assume that humans have a fixed reward 
function (or distribution or sequence thereof), the challenge of building a powerful 
general intelligence aligned with that becomes extremely daunting. Inverse rein- 
forcement learning [1] attempts to learn a (parametric) reward function for situations 
where an analytic description is a poor fit for human concerns. As with the usual a 
priori case, this reward function is taken to be essentially stationary and so the same 
concerns arise [129], [130], [195]. Ultimately, this may prove futile, since humans 
exhibit preferences which do not conform to the Von Neumann-Morgenstern axioms 
[353] of rationality, and hence cannot be said to possess a stationary utility function 
[80]. In any case, it is clearly desirable to avoid a formulation of general intelligence 
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which explicitly requires a proxy for desired states and would consequently be vul- 
nerable to Goodhart’s Law: “When a measure becomes a target, it ceases to be a 
good measure.” [122]. 

We believe that RL is given credit for much of the work that is actually attributable 
to engineers through their intelligent design of reward functions and model archi- 
tectures. More fundamentally, we argue that a framework for general intelligence is 
better founded if it does not assume that rewards (or more generally, objectives) can 
ever be fully known, even by the stakeholders. While some things can be optimized 
to a fixed point, a generally intelligent agent intended to replace human labor would 
not benefit from such essentially stationary notions. For example, domestic robots 
that are expected to work well around growing children must be able to accommodate 
ever-changing constraints and preferences from the family as its youngest members 
grow up. In fact, it becomes increasingly evident that, in the context of general intel- 
ligence, the very concept of ‘reward function’ ends up creating more problems than 
it solves. This notion is explored further in Sect. 6.1. 


5.2 Sampling: Safety and Efficiency 


In Sect. 4.3, we emphasized that reflection is a key requirement for knowledge rep- 
resentation. As with our preceding discussion of deep supervised learning, we have 
identified that feedback mechanisms are a key bottleneck in RL. In the former, we 
focused on the limitations of iteratively updating neural networks through gradient 
descent. Here, we discuss an analogous issue which is fundamental to RL, irrespec- 
tive of reliance on function approximation using deep architectures. In essence, in 
the MDP formalism the agent receives a reward at each timestep as a result of its 
state and chosen action, and hence, goals (taken to be regions of the state space) are 
only expressed via sampling. Note that RL based on model-predictive control (MPC) 
often takes a reward function in closed form, visible to the agent, though this still 
leaves open the problem of finding its optima. 

For arguably the majority of human activities we wish to automate, we take 
advantage of the fact that we can interpret our desired goals prior to the need to 
sample any information from the environment. Granted, this requires sufficient world 
knowledge to make sense of a goal description, but in such a case, this approach 
confers multiple benefits. We explore ways to make use of this perspective in Sect. 6.1. 
Here, we highlight the undesirable side effects of a sample-based approach to goal 
descriptions, of which RL reward feedback is an example. The key point is given in 
the following claim: 
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Claim 6 


The reliance on sampled rewards means that it is inherently unsafe to train RL 
systems in the real world. Even if we approached this with very high fidelity 
simulations, RL would need to exhibit good sample efficiency, which it cur- 
rently does not. 


For humans, a reward function can in principle be made intensional: a ‘white-box’ 
function of state and action. In contrast, the RL procedure incrementally constructs 
an extensional description of the reward, and starts out entirely blind in its sampling. 
This raises serious concerns for the safety of systems developed in this way. Recent 
work has emerged which tries to specifically address safe exploration in deep RL 
[282], [24], [83], [181], but the fundamental issues of using deep neural networks 
for estimation and having to sample during training remain. Of course, one natural 
response would be that we can use resettable simulations to safely train agents until 
we certify them ready to act in the real world. The use of simulation may appear 
to be a panacea, but this strategy simply delays confronting the issue, analogous to 
our observations on domain randomization in Sect. 4.2. The primary challenge is the 
fidelity of the simulation. As a minimum, there is the issue of verisimilitude in the 
behavior of physical objects, with well-known precision issues [326]. More daunting 
is the credible simulation of other agents. 

Moreover, the sample efficiency of deep RL agents is a pressing concern. Sample 
efficiency is a performance measure which is inversely proportional to the number 
of samples that the system needs to obtain from the environment in order to achieve 
success at some objective, e.g., loss, accuracy. Weak sample efficiency is countered in 
the deep RL community by an increasing reliance on extremely large computational 
resources to train on vast amounts of data. The associated engineering remedies are 
hence superficial and reduce visibility of the underlying obstacles. 

Deep RL’s most high-profile achievements are some of the best examples to show- 
case the growing challenge of sample efficiency. OpenAI Five [22] was trained 
with roughly 40,000 years of compressed real-time experience over the course of 
ten months. AlphaStar [351] made use of more sophisticated inductive biases and 
replaced self-play [17] with population-based training [156] but still required roughly 
200 years of real-time gameplay per agent, which was parallelized over massive 
resources in order to achieve a training time of 14 days. In a task of dextrous object 
manipulation, OpenAI’s Rubik’s Cube agent [247] required 13,000 years of com- 
pressed real-time simulation to learn a single task. Note how the single task required 
a similar amount of experience to that required for a highly complex multi-agent 
video game for OpenAI Five, clearly showing that the former is nontrivially more 
challenging for RL to handle. 

In the following chapter, we proceed to address the issue of a priori rewards by 
proposing a framework for Work on Command. 
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Chapter 6 A) 
Work on Command: The Case for E 
Generality 


Workin’ 9 to 5, 

What a way to make a living, 

Barely gettin’ by, 

It’s all takin’ and no giving, 

They just use your mind and they never give you credit, 
It’s enough to drive you crazy if you let it. 


Dolly Parton , ‘9 to 5’ [251] 


Let us recall that, from a pragmatic perspective, AI is nothing more or less than a 
tool for implementing a new leap in automation. This pragmatic perspective on AI 
is the one that matters in the current world economy, and therefore will necessarily 
receive primacy for development. While we do acknowledge AJI-related notions such 
as ‘artificial life’, a significant business case for ‘AI as organism’ has yet to be 
demonstrated, and therefore we consider AI that does not directly seek to deliver 
automation to be out of scope. 

As discussed in Sect.3.2, applications in automation with a known and well- 
defined target or utility function can be addressed using reinforcement learning—RL 
allows one to optimize a controller to perform exactly that task with guarantees on 
speed and accuracy. However, training policies is difficult and time consuming, and 
this hampers an orthogonal class of applications that require the minimization of 
the cost/latency incurred by engineer-then-deploy cycles. There is indeed a recurrent 
need in industry to streamline automation engineering and, in particular, to capitalize 
on learning processes, recognizing that knowledge acquired from automating one 
process can be useful for automating the next. Moreover, techno-economic conditions 
for business sustainability are shifting rapidly, as exemplified by e.g. the Industry 4.0 
initiative in Europe [74, 141]: production lines need to absorb increasing levels of 
flexibility, meaning that processes are irreversibly moving away from the stationarity 
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that was once the norm. For example, a system may be tasked with controlling the 
assembly process of product X, but after a few months X is replaced by new-and- 
improved product Y, similar to X in some ways. Now we would like to tell the 
system: 

Stop assembling X immediately, here’s a specification of Y, and here are most of your old 


and a few new effectors. Now start assembling Y, avoiding such and such kinds of defects 
and wastage. 


We use the notion of ‘work on command’ to refer to the ability of a system to 
respond, at any time, to changes in task specifications, both positive (goals to be 
achieved) and negative (constraints to be respected). To be of any use, we include 
in this notion the ability to leverage all relevant knowledge from prior tasks with 
little effort. Such leveraging should be non-destructive, i.e., it must be possible to 
command the system to resume an earlier task, on which it should in general be able 
to perform at least as well as before. 

We posit that performing work on command requires general intelligence. Keep- 
ing the pragmatic perspective focused on automation engineering, a system’s gener- 
ality can be measured as the inverse of the cost of its deployment and maintenance 
in a given range of real-world task/environment spaces.' It will be clear that, the 
more general an AI system, the better (and cheaper) it will be at performing work on 
command. 

We have seen in Sect. 3.1 that the function of RL is to compile a policy consisting 
of a direct mapping from environment states to actions. In this paradigm, behavior 
is the (fixed) computation of a response to a stimulus, best attuned to the assumed 
specifications of the task and the environment. This notion of ‘behavior as a curried 
planner’ offers the benefits of speed and accuracy: once learned, a policy needs 
little computation to deliver optimal results. However, this comes at the cost of 
brittleness: adaptation is impossible should the task or environment escape the initial 
assumptions after deployment. But it does not have to be like this. In cybernetics, 
system theory, and psychology, behavior is better described as “a control process 
where actions are performed in order to affect perceptions” as noted by Cisek in his 
critique of computationalism [49]—-see also von Uexküll [191]. Since it is a process, 
a behavior can be more easily adapted to the variability of its goal, environment, 
and contingencies. For this reason, we consider processes, not algorithms, as the 
concept most constructive for the synthesis of intelligent systems, as opposed to 
purely reactive systems—see also “The Irrelevance of Turing Machines to AI” by 
Sloman [316]. 

From this perspective, agents have to learn a world model which allows for 
dynamic planning through simulated trajectories.” The value proposition is that this 


' One might argue that generality is better defined as inversely proportional to computational effort 
than to monetary cost. However, both involve empirically-determined (but in general, arbitrary) 
resource expenditure. From the pragmatic, real-world perspective taken here, we find money the 
more suitable unit of measurement. 

2 As noted before, even if model-based RL leverages world models while learning policies, the final 
outcome remains a fixed behavior. 
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model is task-agnostic, making it a general-purpose corpus of knowledge that can be 
reused for achieving a broad variety of goals and shared with other agents to hasten 
their learning. 

The general requirement to perform work on command can be broken down in 
three main components: 


e To handle explicit specifications of goals/constraints and make provision for their 
possible change. 

e To plan dynamically using world models which may change during planning. 

e To deliver plans and learn anytime, i.e., asynchronously with regards to the system 
activity. 


These requirements add to the challenges for RL discussed previously and require 
to alter, significantly, its conceptual basis: in the next two sections, we describe how 
RL can be viewed as an “artificially constrained’ version of an extended framework 
which we term WoC-RL (WoC for work on command). The purpose of WoC-RL 
is purely pedagogical: it serves to introduce aspects of the subsequently described 
‘Semantically Closed Learning’ from the familiar perspective of RL. In the last 
section, we depart from the ML algorithmic world view and propose a process-centric 
perspective on system agency to address the requirement of anytime operation. 


6.1 Goals and Constraints 


The pedagogical purpose of the WoC-RL exercise is to imagine a version of RL 
controllers which would be robust to change and capable of adapting their behavior 
on the job. For this reason, WoC-RL makes the idealistic assumption that a controller 
(hereafter, ’the agent’) performs RL endogenously? after deployment. 

The RL procedure operates in accordance with the formulation given in Sect. 3.1. 
The basic approach can be extended to accommodate additional complexity such as 
partial observability, stochasticity, and multiple agents. The objective for RL algo- 
rithms is to maximize returns, defined as the sum of (discounted) rewards: 


[e6] 
= y'r (sr, ar) 
t=0 


WoC-RL generalizes the objective of RL by instead seeking to achieve goals.* This 
is a more prescriptive notion than goal-conditioned RL, an extension of RL in which 
decisions and value estimates are conditioned on a goal state or embedding thereof 
(sometimes via universal value function approximation [305]). Goal-conditioned 
RL is generally used to separate out subtasks and treat them as distinct learning 


3 We leave aside the issue of intrinsic motivation. 
4 Note that WoC-RL is still subject to the limitations described in Claim 4 of Chap. 5. 
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objectives to hasten the learning of a single complex task [6, 161, 237] or alternatively 
parameterize a set of related tasks for multitask learning [322, 371]. Regardless, there 
is no alteration to the reward structure: it is defined a priori and is sampled from every 
state because while the ‘goal’ is known, the reward is not. 

The goals in WoC-RL effectively replace the notion of reward to become the 
sole motivation for agent action. Whilst state space regions defined by these goals 
can indeed be equipped with some associated quantity indicating desirability (which 
could therefore be said to constitute a ‘reward’ when reached), the set of state space 
regions from which this reward can be earned is explicitly specified, thus obviating 
the need for pointwise sampling of rewards. As such, a WoC-RL agent has access, 
at any real-valued time ż, to the following goal structure G: 


Gi = {(S!, TR: O77 R3), ...} (6.1) 


where each S' is (a partial specification of)° a state, each T' is a time interval, and 
each RÌ is a positive or negative real. Having access to G;, the agent knows which 
(future) states are promised to give a reward or punishment when visited during a 
specific time interval. If the current state S, matches a rewarding state (i.e., there 
exists (S', Tİ, RÌ) € G, such that S, D S‘ and t € Tİ and Rİ > 0), a goal is said to 
be achieved; if the current state matches a punishing state (RÌ < 0), a constraint is 
said to be violated. The relative values of RÝ are only useful to an agent in that they 
aid prioritization between different goals. 

The idea is that the tuples in G will typically persist over long time spans. Nonethe- 
less, in accordance with the requirement to perform work on command, a user can 
at any time modify G by adding or deleting state space regions: whether goal states 
in order to ‘command’ the agent to achieve things, or forbidden regions in order to 
‘warn’ the agent. Assuming that the user is not perfectly wise in specifying the work, 
it may happen that G will contain just one or few goals at any time, but also ever 
more numerous or detailed constraints, as the user’s insight increases. 

The expected value of the reward of a WoC-RL agent for taking action a in state 
s can be defined as in traditional RL [331]: 


r(s,a) = EL R: | S;-1 =s, A1 =a] (6.2) 


but WoC-RL additionally defines the reward signal R as being completely specified 
by G: 

R,|G=VER|(GS,T, REG, %$28S,terT] (6.3) 
where R, | G reads as ‘the reward given the goals’. The long-term return as well as 
the value function can again be defined as is traditional in RL [331]; however, neither 
of these notions serve any purpose in WoC-RL. 

The goal structure G can also be viewed as an explicit description of those states for 
which a (sparse) reward function would output non-zero values—the only difference 


5 We write s D s’ to mean that s matches s’, where s’ may be partially specified. 
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being that this description is explicitly given to the WoC-RL agent without needing 
to be sampled. Here we see that RL is a restricted case of WoC-RL, namely where a 
(hypothetical) user would only provide instantaneous goals. That is, RL is obtained 
if G is restricted as follows. 


Vs,t: either G,=@ or Jr e R: G, = {(s, {t}, r)} (6.4) 


This stipulates that rewards are immediate and their distribution is not explicitly 
known a priori. Restricted in this way, WoC-RL turns into traditional RL, as the 
agent lacks prior information about goals and constraints, never being told ahead of 
time when or where a reward might be received. 


6.2 Planning 


WoC-RL changes the optimization problem of RL to that of planning: the agent 
must plan to reach the most valuable state space regions—while avoiding the most 
punishing ones—as per the goal structure G, while remaining open to changes in G 
that can occur at any time. 

This framework is inherently task-agnostic since the WoC-RL controller must be 
designed to handle anytime changes in G, whereas RL algorithms typically expect 
rewards to be stationary. As such, the concept of a policy is not particularly helpful to 
WoC-RL as it relates to reward. What WoC-RL needs to do is propagate goals through 
a world model M in order to generate plans. Such a world model is effectively a 
learned transition model, ideally of a causal nature, and even more ideally, allowing 
bidirectional processing. A bidirectional world model allows sensory information 
to propagate forward as predictions (which, when falsified, can trigger learning), 
and goals to propagate backward as abductive planning. Nonetheless, even a purely 
forward world model can be used for deductive planning (disregarding tractability 
issues), which is easier to express formally, and will be the focus of this section. 
Abduction will be treated in more detail in Sect. 9.4. 


Plan generation can be illustrated by defining an anytime planner: 
work : Time > A 


that finds the action with the best return according to its current world model M and 
goal structure G (which are retrieved from an implicit storage). Here it is crucial to 
note that work is fixed, i.e., it is not a learnable policy. As prescribed before, G is 
changeable by the user but not by the agent. That leaves M—the world model—as 
the only adaptable ingredient of WoC-RL. For pedagogical reasons let us formally 
specify work in terms familiar to the RL community: 


work(t) = argmax q.(S;,a,t | G, M) (6.5) 


44 6 Work on Command: The Case for Generality 


where q+ is analogous to the optimal action-value function [331], except here it takes 
not only state s and action a, but also explicitly propagates time t, goals G, and model 


M: 


qx(s,a,t|G,M)= 
$ pmet | 8,4, DIERY | G) + max q6’, a',t' | GM) 
sr 


where it is assumed the state transition probability p is included in, or can be derived 
from, model M.° It is important to note that the goal structure G is propagated 
unchanged by the agent even though it can be changed at any time by the user: if that 
happens at time t”, work(t") will start using the new goal structure G, containing 
the user’s new goal and constraint specifications. This is in line with the work-on- 
command ideal of an agent performing task(s) exactly until the user tells it to do 
otherwise. 

The purpose of the above formula is to demonstrate that WoC-RL is not con- 
cerned with sampling rewards or learning a policy. State quality q, is well-defined 
in terms of G and M which are given to qx. Action sequences (plans) output by 
work change if and only if G or M changes. Thus, here we see how WoC-RL avoids 
conflating knowledge and motivation, which is crucial for the ability to work on com- 
mand: whenever motivations are (externally) adapted, the latest knowledge should 
immediately be brought to bear to act toward those new motivations. In contrast, the 
classical notion of a policy is as a body of (behavioral) knowledge with respect to 
one target. This offers no provisions for re-applying the same knowledge for new 
and changeable targets, such as when automating a succession of related business 
cases. 

Still for pedagogical reasons, we sketch below the algorithmic context in which 
work can be embedded. The simplest setup is a loop which also updates the world 
model M based on the results of its predictions (gathered in ‘pred’ below). Note 
that changes to the goal structure are supposed to occur externally: work(t) always 
retrieves the actual G,. 


1 pred := Ø; 
2 while true do 
3 | t := WallClock.nowQ; 


4 | expired := { sy € pred | t' < t }; 

5 | update M, if ‘expired’ contains surprises; 

6 | pred := (pred \ expired) U {5;}; 

7 | pred := pred U | ]J{ forward(M,, sr) | sy € pred }; 
8 | a:=work(t); 

9 | execute(a); 


€ Although in RL it is typically the case that z’ = t + 1, WoC-RL does not require such discrete 
time-stepping, assuming instead simply that r’ > t. 


6.2 Planning 45 


The function forward yields zero or more predictions. Without going into the 
detail of forward, we remark that its predictions could carry a likelihood, which may 
be multiplied for each forward step—and forward may omit predictions below a 
certain likelihood threshold. In that case, line 7 could be performed in a loop until 
‘pred’ no longer changes. 

We see in this algorithm that M, is used for making predictions, causing M, to be 
updated whenever they fail. Now we can also see the naivety of work as specified in 
formula (6.5): if M, is used forward for prediction making, then it should also be used 
backward for abductive planning. Such abductive planning can proceed iteratively, 
in a similar manner to the prediction making as specified in the above algorithm. 

Notice also that the actual work on command, as embodied in lines 8 and 9, can 
be parallelized with the prediction making, since these two lines do not depend on 
the others, only on the objective time. Thus, we see here how we can finally abolish 
the ‘cognitive cycle’, namely the perceive—act—update loop in RL, which is inherited 
from the sense—think—act loop in GOFAI. However, RL still relies on time-stepping 
in lockstep with the environment, dangerously assuming that the environment does 
not change while the agent chooses its next action. The way out is to realize that 
changes in knowledge and motivation do not have to occur at every time-step or at 
set times and that plans span over larger time horizons.’ For as long as predictions 
succeed, and for as long as the goal structure is not modified by the user, the plans 
that were or are being formed can be assumed to remain valid—to be best of the 
agent’s knowledge. Thus, a WoC-RL agent could be made to learn, predict, and plan 
continually and asynchronously from the environment. An asynchronous open-ended 
continual inference mechanism will be described later in Sect. 7.3. 

Finally, this sketch of WoC-RL illustrates the need for bootstrapping: if M and/or 
G would start out empty, no work will be performed. In the early learning phase, the 
user will need to ‘seed’ M with minimal world knowledge, and populate G with 
simple goals that can be achieved with that little knowledge. The user may have to 
act as a ‘teacher’ or ‘mentor’, populating G with progressively more difficult goals 
and constraints. 

Here we have described how a pragmatic perspective on general intelligence 
calls for the engineering of systems that can perform work on command. WoC- 
RL illustrates what such a system might look like, and how it would differ from 
‘vanilla’ RL. A crucial aspect is the decoupling of knowledge (M) and motivation 
(G). Although we have detailed what G may look like technically, we have so far not 
delved into the learnable M. To attain general machine intelligence, both G and M 
must be framed in a different manner than has been customary throughout the history 
of AI, from GOFAI to reinforcement learning. The proposed approach is described 
in the following chapters. 


7 This would require changing the signature of the anytime planner work to return a set of timed 
actions instead of a single one. 
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6.3 Anytime Operation 


We are concerned with the requirements of deliberative intelligence interacting in 
a society of asynchronous actors (whether organic or synthetic) to achieve multiple 
goals—temporally overlapping, potentially non-stationary—at arbitrarily long time 
horizons (Fig. 6.1). We have determined that such agents must plan over world 
models, and this raises the following issue. If the worst-case execution time (WCET) 
of consulting a world model exceeds a certain threshold,’ the system becomes too 
unresponsive to be effective. This is exacerbated by the fact that, in the setting of open- 
ended and lifelong learning, world models inevitably grow in size and complexity. 
A general solution to this issue is to enforce granularity as a property of the world 
model. Rather than treating the world state as monolithic, a system should be able 
to devote computation time to consult only the parts of the world model that are 
relevant to its goals. However, we argue that granularity alone is not enough with 
regards to world modeling. Even if the WCET of planning is adequate, there is one 
final necessary conceptual change. From the perspective of RL, inference is paced by 
the wall clock of the environment. Mathematically, this may not make a difference 
when compared to the alternative of an internal, asynchronous clock, but again, 
the context of real-world situated intelligence changes this. Whereas the WCET of 
RL inference is negligible compared to the time scale at which the environment 
evolves, the inference WCET of a deliberative agent is potentially subject to high 
variability, precluding pacing inferences by the wall clock. Instead, inferences must 
be computed in anticipation of the unfolding of world events, and this requires an 
internal asynchronous clock: to comply to ‘anytime’ requirements means to coincide 
asynchronous internal deliberations with world-synchronous goals. Just as there is 


Requirements 


| Agent _ Anytime Operation 


A 


Fig. 6.1 Anytime operation is required to deliver relevant correct action plans on time and asyn- 
chronously (relative to system activity). It enables the economically desirable capability of work on 
command 


8 Given a goal, at time f, with a deadline d, the WCET of predicting and planning must be less than 
d-t. 
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the study of bounded rationality [362], there is work on the corresponding concept of 
anytime bounded rationality [27, 150, 241], and we see that this combination is an apt 
way to unify many of the desirable properties of general intelligence. Unfortunately, 
common practice in RL is fundamentally incompatible with anytime rationality since 
it relies both on synchronous coupling with the environment and the concept of 
‘behavior as a curried planner’. As alluded to previously, anytime operation has not 
been considered in most demonstrations of RL to date, but we believe that it will 
become a stubborn hindrance for situated control and multi-agent scenarios. 
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Chapter 7 A) 
Philosophy rie 


The process of induction is the process of assuming the simplest 
law that can be made to harmonize with our experience. This 
process, however, has no logical foundation, but only a 
psychological one. 


Wittgenstein, ‘Tractatus Logico-Philosophicus’, 6.363 [367] 


In many respects, machine learning’s current concerns are reminiscent of those which 
heralded the rejection of GOFAI. Since it was evidently not possible to construct a 
suitably malleable model of the world a priori in terms of rigid logical facts, the 
solution was surely to induce the required representations, ideally from raw data? 
Given the challenges previously discussed, the requirement to create robust models 
is just as pressing today as it was in the early 1990s, when GOFAI was nominally 
supplanted by the ‘Physical Grounding Hypothesis’ [35]. In that sense, AI still needs 
learning algorithms that can do more than ‘skim off the surface’ of the world they 
attempt to represent. By this, we mean that knowledge representation should enjoy 
both robustness and malleability. By ‘robustness’ we mean Gestalt compositional 
interpretation in the presence of noise, so that turtles are not considered to be rifles 
[9] even if their images have some similarity at a local scale. By ‘malleability’ we 
mean the ability to envision a range of alternative hypotheses which are compatible 
with some context. In order to achieve this, we believe that machine learning needs 
to undergo the same fundamental shift that took place in the philosophy of science 
in the mid 20th century. 
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7.1 The Problem of Machine Induction 


The discussion of Chaps. 4 and 5 illustrates that a major concern for DL and RL is the 
ability to obtain robust generalizations with high sample efficiency.! However, the 
ubiquity of domains with long-tailed distributions is antithetical to the very notion that 
learning can be predominantly driven via sampling. For example, a widely-respected 
pioneer of autonomous vehicles [339] has stated: 


To build a self-driving car that solves 99% of the problems it encounters in everyday driving 
might take a month. Then there’s 1% left. So 1% would still mean that you have a fatal 
accident every week. One hundredth of one percent would still be completely unacceptable 
in terms of safety. 


Human drivers, though naturally fallible, are considerably more robust to the combi- 
natorially vast range of situations they might encounter. What change in perspective 
might be required in order to imbue a system with analogous capabilities? 

The essential practice of supervised learning is to tabulate samples of input/output 
observations and fit a regression model based on a numerical loss function. Like- 
wise, the RL framework optimizes an objective function sampled iteratively from 
the environment, and deep RL uses deep learning tools for function approximation. 
As previously observed [212, 344], in terms of the philosophy of science, this is 
very much in the empiricist tradition, in which observations are the primary entities. 
In DL, treating observations as primary has led to the notion of model induction 
as a curve-fitting process, independent of the domain of discourse. However, the 
incorporation of sufficient analytic information can obviate the need for sampling 
in both cases. As observed by Medawar, ‘theories destroy facts’ [221]: in order to 
predict e.g. future planetary motion, we do not need to tabulate the position of all 
the molecules that compose a celestial body, but rather apply Newton’s Laws to 
a macroscale approximation of its center of gravity [326]. Similarly, under certain 
conditions a description of the behavior of the simple pendulum can be obtained 
in closed form [160], demoting empirical sampling of orbits to the role of fitting a 
distribution to any remaining noise. Indeed, science itself can arguably be charac- 
terized as the progressive transformation of ‘noise’ into ‘signal’: replacing, insofar 
as human comprehension permits, uncertainty and nondeterminism with coherent 
(i.e. relationally-consistent) structure. The resulting structures yield a much stronger 
notion of ‘compression’ than expressed by the corresponding use of the term in RL. 

Although the empiricist perspective has prevailed since the Renaissance, it was 
inextricably bound to a deep philosophical problem: the ‘Problem of Induction’. The 
problem asks what firm basis we have to believe that past inferences will continue 
to be valid, e.g., that the Sun will continue to rise in the morning. Epistemologically, 
we cannot hypothesize that past distributions of observations will resemble future 
distributions, since this begs the question. The problem resisted all solution attempts 


' Addressing this concern is necessary but not sufficient: even if we could obtain robust generaliza- 
tions via current DL methods, they would be neither granular nor compositional, both of which we 
subsequently argue for. 
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until Karl Popper provided one in the mid 20th century [268]. Popper’s solution 
was to show that conclusions could be well-founded without requiring that ‘laws’ or 
distributions be somehow Platonically propagated through time. Instead, he argued 
that although our hypotheses may be inspired by observation, they are altogether of a 
higher-order, being predominantly characterized by their explanatory power. As sub- 
sequently further developed by Deutsch [63], the key objects of discourse for science 
are therefore not observations but the explanatory characteristics of the hypotheses 
which they motivate. Hence, consistent with the definition of Sect. 2.2, we can con- 
sider a hypothesis as an inter-related system of statements intended to account for 
empirical observations. The tentative, self-correcting nature of the scientific method 
means that: 


e At any given instant, the collection of statements are not necessarily entirely self- 
consistent (cf. the longstanding inability to reconcile quantum mechanics and 
general relativity). 

e Falsifiability via observation is not the primary driver. Although Popper empha- 
sized falsification via observations, a subsequent refinement [186] emphasized 
that the prevailing hypothesis may not even agree with all observations, provided 
that it is a rich enough source of alternative hypotheses which potentially could 
do so. It seems reasonable to consider this to be the spirit underlying the opening 
Feynman quote. 


A famous demonstration that such heuristics are part of common scientific prac- 
tice is the discrepancy between the predictions of general relativity and the observed 
rotational velocities of galaxies [246], relativity being a hypothesis which has repeat- 
edly been vindicated in other wide-ranging experiments. Hence, while there is still 
some global inconsistency between different local models (which will continue to 
motivate further, hopefully ultimately unifying, hypotheses), this still allows the use- 
ful application of local models at the appropriate scale. Over the years, the philosophy 
of science has conjectured various heuristics for confronting rival hypotheses: 


e Parsimony: this heuristic is exemplified by ‘Occam’s Razor’. However, it must be 
stressed that this is not merely a domain-independent measure such as is advocated 
by Algorithmic Information Theory [43], but something that is achieved via reflec- 
tive interpretation of the hypothesis in order to reconcile causal inconsistencies. 

e ‘Hard to Vary’: This notion was introduced by Deutsch [63]. Hypotheses which are 
so unconstrained as to permit the generation of many roughly-equivalent alterna- 
tives are unlikely to capture the pertinent causal aspects of a situation. Conversely, 
when a hypothesis which has been preferred to many others generates few or no 
alternatives, that is an indication that it is a good hypothesis. An initial investigation 
of the role played by ‘hard to vary’ heuristics in AI is given by Elton [79]. 


Itis also important to note that, by virtue of compositionality, the notion of hypoth- 
esis here is stronger than “distribution over outcomes’ [64]. For example, suppose that 
six in a hundred patients who are flu sufferers were to hold a crystal and experience 
a subsequent improvement. Despite statistical significance, an experimenter would 
not (in the absence of some other deeply compelling reason) subscribe immediately 
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to the notion that the crystal was the cause, because of the end-to-end consistency 
of existing explanations about how viral infections and crystals actually operate. 
Such inferences therefore operate at a different level than purely statistical notions, 
in which claims of causality must anyway be justified in terms of priors known to the 
domain-aware researcher when they frame the experiment. Hence the researcher here 
has two privileges that traditional ML lacks: firstly, prior semantic knowledge about 
the type of variables (the displacement of the pendulum bob is measured in radians, 
the color of the pendulum bob is a property of the material with which it is coated, 
etc.). Secondly, in the case that prior knowledge (or hypothetical interventions such 
as Pearl’s ‘do operator’ [254]) does not adequately make the case for causality, the 
researcher has the potential to clarify further via alternative experiments. 

Scientific explanations have proved remarkably effective in describing the world 
[363]; for example, our understanding of force and motion at the ‘human scale’ (1.e., 
between quantum and relativistic) has remained robust since Newton. Most signif- 
icantly, such understanding is emphatically not in general a quantitative function 
of the causal chain (e.g., some loss or objective function), but is instead dependent 
on the overall consistency of explanation. ‘Consistency’ here means not only con- 
sistency with respect to empirical observation, but the ‘internal consistency’ of the 
entire causal chain described by the hypothesis. 

The solution to the ‘Problem of Machine Induction’ should therefore precisely 
mirror Popper’s solution to the ‘Problem of Induction’, i.e., to reject empiricism in 
favor of explanatory power and attempt to afford suitably curious machine learners 
the same privileges in determining causality as are presently enjoyed only by human 
experimenters. In the remainder of this chapter, we describe ‘Semantically Closed 
Learning’, a framework proposed to support this. 


7.2 Semantically Closed Learning (SCL) 


Just as the logical expressions of GOFAI could be said to be too ‘rigid’ with respect 
to their ability to model complex environments, so the parameter space of DL archi- 
tectures is too ‘loose’. Hence, while it is relatively computationally inexpensive to fit 
a deep learning model to almost any naturally-occurring observations [203], general- 
ization is certainly not assured [374]. It would appear that something intermediate is 
required, in which there is no requirement for a priori provision of either an arbitrarily 
complex objective function or an exponentially large collection of rules. To that end, 
we describe below a set of operations intended to support principled and scalable 
scientific reasoning. In particular, the ‘scientific’ aspect of reasoning can be charac- 
terized by the gradual progression from an extensional representation (i.e., pointwise 
tabulation of the effects of operators) to an intensional one (i.e., representable as an 
expression tree with a knowable semantic interpretation), as with the analytic descrip- 
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tion of the pendulum described above.? These operations are invoked by a granular 
inference architecture, a reference version of which is described in the next chapter. 
We tie these together under the heading of ‘Semantically Closed Learning’ (SCL), 
the name having been chosen with reference to the property of ‘semantic closure’. In 
the context of open-ended evolution, the term semantic closure was coined by Pattee 
[252, 253], who described it as: 


An autonomous closure between (A) the dynamics (physical laws) of the material aspects 
of an organization and (B) the constraints (syntactic rules) of the symbolic aspects of said 
organization. 


An implementation of such open-ended evolution is given in Clark et al. [51]. They 
describe a ‘Universal Constructor Architecture’, in which genomes both contain and 
are decoded via an expressor. The decoding process is stateful (being analogous to 
gene transcription), and may experience degradation via contact with the environ- 
ment. Notably, they state: 


The cleanest possible example of demonstrating semantic closure: the genome originally 
encoded the seed expressor, now it encodes a different expressor, but the genome string 
itself is not altered at any time, only the meaning of the genome string has been altered. 


Abstracting from the concrete implementation of Clark et al., we consider a seman- 
tically closed system to be one equipped with a stateful interpreter, such that: 


e The next step in the state trajectory is determined via the application of the inter- 
preter. 

e The interpreter is jointly a function of the state trajectory of the system and its 
interaction with the environment. 


This suffices to describe systems capable of open-ended evolution, but is closer 
to the notion of ‘AI as organism’ than the required one of ‘AI as tool’. Hence, we 
must additionally cater for the achievement of goals (self-imposed or otherwise). We 
therefore define a semantically closed learner as a semantically closed system whose 
interpreter state is adapted, via interaction with the environment, so as to reduce the 
discrepancy between expected and actual states. 

When the discrepancy between actual and expected states is determined via inter- 
action with a ‘sufficiently complex’ environment (see Prop-3 of Sect. 7.3, below 
for more details), the above notion of semantic closure affords situatedness. The 
interpreter maps from system state to predictions and actions, being subject to repair 
when predictions are not met. Learning thus takes place as a function of the dis- 
crepancy between the actual and desired state of affairs. The important aspect for 
general intelligence purposes is support for open-ended learning. Such learning is 
considered here as an operationalization of the scientific method, which requires that 
an agent can generate hypotheses which it can process as first-class entities. 


2 As previously stated in Chap. 4, we of course acknowledge that not all phenomena can be effec- 
tively compressed analytically. In certain cases, deep learning is indeed a viable approach, but the 
anticipation is that this will occur predominantly at the ‘leaves’ of the expression tree, which are 
operating directly on raw sensor data. 
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In one sense, all human beings are scientists (cf. Sloman on “Toddler theorems’ 
[317]). For example, even at a nascent level of cognition, concept formation [262] 
can be seen as abstracting from an iterated process of hypothesis generation/valida- 
tion. One may consider higher levels of cognition to be hierarchical, in the sense that 
they make use of lower-level hypotheses (such as object permanence) as elements. A 
certain amount of introspection into one’s own problem-solving activity will reveal 
that higher levels of human reasoning are an ‘end-to-end white-box’ activity: arbi- 
trary (and even novel) features of both a problem and its proposed solutions can 
be confronted with one another. These features are of course ultimately grounded 
in experience of the real world [187]. As such, the hypotheses evoked by any con- 
frontation of features are so strongly biased towards the real world that events from 
the long-tail of impossible/vanishingly unlikely worlds are never even entertained. 
However, to talk purely in terms of bias as a means of efficient inference is missing 
a key aspect of human cognition in general, and the scientific method in particu- 
lar: a hierarchy of compositional representations offers the potential to reason at a 
much coarser granularity than the micro-inferences from which it was constructed. 
Therefore, reasoning can be considered to occur in a Domain-Specific Language 
(DSL) which has been abstracted from the environment by virtue of the ubiquity and 
robustness of its terms [87, 187]. This is in contrast to the prevailing approach in 
ML, in which inference is entirely mediated through numerical representations, as 
biased via some loss or reward function. There, some level of generality is achieved 
by reducing the notion of feedback to the ‘lowest common denominator’ across 
problem domains. 

It is therefore instructive to consider compositional learning in the context of the 
historical development of intelligent systems. The early cyberneticists understood 
that ‘purposeful’ behavior must be mediated via feedback from the goal [299]. Since 
they were predominantly concerned with analog systems, the feedback and the input 
signal with which it was combined were commensurate: they could be said to be 
‘in the same language’. In the move to digital systems, this essential grounding 
property is typically lost: feedback is often mediated via a numeric vector that is typ- 
ically a surjective mapping from some more richly-structured source of information. 
Useful information is therefore lost at the boundary between the learner and the feed- 
back mechanism. In Sect. 9.4, we describe a compositional mechanism for hybrid 
symbolic-numeric feedback at arbitrary hierarchical levels that does not intrinsically 
require any such information loss. 


7.3 Baseline Properties of SCL 


As detailed subsequently, while the specific choice of expression language for the 
DSL is rightfully an open-ended research problem, a number of elementary properties 
are required: 
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Fig. 7.1 Concepts involved in the reasoning process. a A rule (colored circle) implements a relation 
between typed values (shapes on either side). For forward inference, rules are read left-to-right: an 
object of one type is transformed into an object of another type via a transfer function. b A type 
may be structured in terms of other types. c A repertoire of rules and types. Rules are values and 
may be composed, such as in the blue and gray rules. Rule firing is also a value (here depicted on 
the left side of the yellow rule), and so the reasoning process (i.e., the production of inferences) 
can be reasoned about. d A possible unfolding of forward inferences produced by the repertoire. e 
Inferences can produce new rules—they can also produce new types (not depicted) 


Support for Strong Typing (Prop-1) 


At the most elementary level of representation, labels for state space dimensions 
can be used to index into code (e.g. stored procedures) and/or data (e.g. ontolo- 
gies). Building upon this explicit delineation and naming of state space dimensions, 
a defining property of SCL is the use of a strongly-typed ‘expression language’ [265] 
which can be used to represent both constrained subregions of the state space and 
the ‘transfer functions’ that map between such regions (see Fig. 7.1). Types therefore 
form the basis for a representation language which, at a minimum, constrains infer- 
ence to compose only commensurate (i.e. type-compatible) objects. Unlike testing, 
which can only prove the presence of errors, the absence of errors (and indeed, more 
general safety guarantees) can be witnessed via strong typing. An elementary such 
example is the construction of causal models which are physically consistent with 
respect to dimensional analysis. 

In software engineering, such modeling has well-understood safety implications; 
for example, the bug that led to the loss of NASA’s ‘Mars Climate Orbiter’ in 1999 
was due to an invalid conversion between two different dimensional representations 
of impulse [215]. However, this example only scratches the surface of what can be 
expressed [263, 265]: the rich body of work in type theory is an ongoing investiga- 
tion into which aspects of system behavior can be expressed statically, i.e., without 
requiring actual program execution. For example, certain invariants of the application 
of transfer functions to subregions of the state space can be modeled via refinement 
types, which use predicates to define which subset of the possible values repre- 
sentable via a tuple actually correspond to valid instances of the type; as pedagogical 
examples, one can explicitly define the type of all primes or the type of pairs of even 
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Fig. 7.2 The property of ‘endogenous situatedness’ imbues an agent with knowledge of its own 
causal abilities, which includes various proxies for the capabilities of its own reasoning process. 
This of course also requires a reflective representation and declarative goals and constraints 
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integers. As well as constructing arbitrary new dimensions of singleton type, it is 
possible to create new dimensions via other type-theoretic constructions, e.g. tupling 
or disjoint union of existing types [176]. Since there are intrinsic trade-offs between 
expressiveness, decidability, and learnability, the specific choice of type system is 
intentionally left open, being rightfully a matter for continuing research (a concrete 
example is nonetheless provided in Sect. 10.1). Fortunately, the constructions for 
compositional inference given in Chap. 9 can be defined in a manner that is agnostic 
with respect to the underlying type system. 


Reflective State Space Representation (Prop-2) 


As a minimum, the state space includes the actionables and observables of the envi- 
ronment and/or the system. As discussed in Chap. 6, it must be possible to explicitly 
declare objectives (‘goals’) as delineated regions within the state space. While the 
base dimensions of the state space (corresponding to sensors and actuators for a situ- 
ated agent) are specified a priori, the representation may also permit the construction 
of synthetic dimensions (e.g. to denote hidden variables or abstractions, as described 
below), similar to Drescher’s ’synthetic items’ [71]. As discussed under ‘Work on 
Command’, the property of reflection obviates the need for sampling of rewards, and 
allows for dynamic changes to goal specification, since state space constraints are 
available to the agent. A reflective state space is also key for enabling the creation of 
new types through abstraction. 
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Endogenous Situatedness (Prop-3) 


A system has ‘grounded meaning’ if its symbols are a context-sensitive function 
of the system’s experience, a property typically lacked by GOFAI systems. In his 
seminal work, Harnad considers a system to be grounded [134] if reasoning is driven 
by sensory inputs (or invariants induced thereof) from the real world. Wang [355] 
argues that this notion of ‘real world’ can be relaxed to apply to ‘real-time operation 
in any complex, uncertain dynamic environment’, provided that (1) symbol interpre- 
tation is contextually driven by information obtained from the environment and (2) 
information is updated via a feedback loop that includes the action of the system. The 
reflective state space representation of end-to-end hypothesis chains of Prop-2 thus 
suffices for the system to be situated in its environment. That is, the mapping between 
grounded sensors and effectors proceeds via a world model in which feedback from 
effectors is reflectively inspectable. 

However, there is a yet stronger notion of ‘situated’ which more closely captures 
a system’s causal capabilities: being endogenously situated. This arises from the 
observation that “an organism’s own patterns [| . . .] are also stimuli” [97]. A system is 
therefore endogenously situated when (at least some of) its internal representations? 
should also be considered part of the environment in which the system operates, and 
these endogenous stimuli are given meaning via their ultimate participation in causal 
sensor-effector chains. 


Open-Ended Continual Granular Inference (Prop-4) 


As discussed, our pragmatic definition of general intelligence emphasizes the need for 

flexibility of response. This requires that an intelligent system avoids the “‘perceive— 
act-update’ cycle of traditional RL and GOFAL, in which it is effectively assumed that 
the system and the world progress in lockstep. Since system deliberation time will 
necessarily increase with environment and task complexity, the lockstep approach 
will not scale. As per previous work on time-bounded inference [242, 243] the alter- 
native is to perform many simultaneous inferences of smaller scope, each inference 
having a WCET (worst-case execution time) that is both small and bounded—hence 
‘granular’. 

In some of our previous work [242], scheduling based on dynamic priorities 
is used to explore a vast number of lines of reasoning in parallel, while retaining 
flexibility and responsiveness. As described in more detail in the next Chapter, by 
virtue of scheduling over granular inferences, attention is then an emergent process: 
the analog of attention-weights [16] are the priorities, which are dynamically updated 
as a function of the expected value of inference chains. 


3 For example, projections from SCL’s internal ‘train of thought’ state trajectory, monitoring of 
resource usage, etc., as described in Chap. 8. 
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7.4 High-Level Inference Mechanisms of SCL 


Building on the support for strong typing (Prop-1) and reflective state space represen- 
tation (Prop-2), SCL makes use of four methods of compound inference: hypothesis 
generation, abduction, abstraction, and analogy. All inference steps in SCL can be 
considered to be the application of some rule r : A —> B, for types A and B. If no 
such rule exists, then it is necessary to synthesize it, as described in more detail in 
Chap. 10. This synthesis process may involve any combination of the following: 


Abstraction 


For purposes of SCL, abstraction is considered to be the process of factorizing com- 
monality from two or more hypotheses. One can view this factorization as a paramet- 
ric or ‘partially instantiated’ hypothesis. A suitable choice of parameters may allow 
the motivating hypotheses to be (at least approximately) recovered. In a numerical 
domain, approaches such as PCA/SVD [25] compress a set of empirical observations 
into a basis set from which observations can be reconstructed via a specific weighting 
of the basis vectors. In SCL, the methods used for decomposition and reconstruction 
of the state space must be applicable at the symbolic level of expression trees, as 
well as for any numerical expressions at their leaves. In Sect. 9.3, we describe one 
possible compositional mechanism for abstraction. 


Hypothesis Generation 


This is the means by which salient hypotheses are generated. Hypothesis genera- 
tion interprets an existing hypothesis to yield a new one intended to have fewer 
relational inconsistencies. It is a broader notion than the counterfactual reasoning 
conducted using structural causal models (SCM), since rather than merely taking dif- 
ferent actions, it considers the overall consistency of alternative models. Informally, 
this can be seen as the imposition of semantic/pragmatic constraints on expressions 
in a generative grammar. As an example from the domain of natural language, the 
famous phrase “Colorless green ideas sleep furiously” is syntactically valid, but not 
semantically consistent: neither color nor the ability to sleep are properties typically 
associated with ideas. The semantic inconsistency here is immediately obvious to 
the human reader, but in general an artificial learning system must use feedback 
from the environment to discover any semantic inconsistencies in its interpretation. 
Hypothesis generation is described in detail in Sect. 9.2. 
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Analogical Reasoning 


Analogy has been argued to be the “core of cognition” [146]. It can be considered 
as a generative mechanism that factors out a common ‘blend’ [84] between two sit- 
uations. There is considerable literature on cognitive and computational models of 
analogy: in-depth surveys can be found in Genter and Forbus [106] and Prade and 
Richard [271]. Analogy is generally considered to be either predictive or propor- 
tional. Predictive analogy is concerned with inferring properties of a target object 
as a function of its similarity to a source object (e.g. the well-known association 
between the orbits of planets in the solar system and electron shells in the Rutherford 
model of the atom). Study of proportional analogy extends at least as far back as 
Aristotle [52]. A proportional analogy problem, denoted: 


Ale B fe. G D 


is concerned with finding D such that D is to C as B is to A. For example, “gills are 
to fish as what is to mammals” is notated as: 


fish : gills :: mammals : ??? 


In Sect. 9.3, we describe one possible computational approach to proportional anal- 
ogy. 


Abduction 


By virtue of the reflective representation, it is possible to perform inverse inference. 
A hypothesis can thereby be updated directly by working backwards from its effects, 
rather than via the indirection of a sampled objective function, which was the primary 
objection raised in both Sects. 4.3 and 5.2. 

With a bidirectional reasoning process (illustrated in Fig. 7.3) it is possible to 
‘backpropagate’ actions directly along the hypothesis chain from effects (‘failure of 


Bea Boa Hoa 


a. b. G; 


Fig.7.3 Bidirectional rules. Rules support both induction and abduction; depending on their denota- 
tional semantics, their inputs and outputs (marked ‘?’) are ascribed particular meanings. a Induction: 
the output can be a prediction or a timeless entailment (e.g., an instance of a subtyping relation). 
The inputs may be (counter)factual (e.g., sensory inputs or absence thereof), induced or abducted. b 
Abduction: the input can be a goal, an assumption, or a (counter)fact. The outputs can be subgoals, 
subassumptions, or timeless premises; they are not necessarily unique. c The choice of outputs is 
constrained by an input 
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an upside-down table to provide support’) to counterfactuals (‘if the table were turned 
the other way up ...’). In DL and RL, the ‘representation language’ is untyped and 
noncompositional, so this kind of direct modification of hypotheses is not possible. 
In Sect. 9.4, we describe a compositional mechanism for abduction. 


7.5 Intrinsic Motivation and Unsupervised Learning 


In contrast to the a priori problem formulation required for supervised learning, the 
scientific method is an iterative process of problem formulation and solving. Such an 
iterative approach performs both supervised and unsupervised learning, the former 
corresponding to the meeting of objectives supplied a priori, the latter being the search 
for more compelling hypotheses, potentially via new experiments. In this wider 
framework, hypotheses have the non-monotonic property of the scientific method 
itself, i.e., they are potentially falsifiable by subsequent observation or experiment. 

The aspiring robot scientist must therefore decide how to interleave the processes 
of observation and hypothesis generation. Prior art on this in the (simply-stated but 
open-ended) domain of number sequence extrapolation is Hofstadter et al.’s Seek- 
Whence [144], which decides when to take further samples as a function of the 
consistency of its hypothesis. In a more general setting, the self-modifying Power- 
Play framework searches a combined task and solver space, until it finds a solver 
that can solve all previously learned tasks [307, 323]. In more recent work, Lara- 
Dammer et al. [190] induce invariants in a ‘molecular dynamics’ microdomain in a 
psychologically-credible manner. 

In particular, our chosen definition of general intelligence acknowledges that 
resources (compute, expected solution time, relevant inputs, etc.) are finitely bounded. 
At the topmost level, the corresponding resource-bounded framework for the scien- 
tific method is simple: within supplied constraints, devote resources to finding good 
hypotheses, balancing the predicted merits of hypothesis refinement against the abil- 
ity of the refinement to further distinguish signal from noise. The presence of such 
bounds is an intrinsic guard against the kind of ‘pathologically mechanical’ behav- 
iors that one might expect from algorithms which do not incorporate finite concerns 
about their own operation, as detailed further in the next chapter. 
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ACHILLES: The profession of Anteater would seem to be 
synonymous with being an expert on ant colonies. 

ANTEATER: I beg your pardon. ‘Anteater’ is not my profession; 
it is my species. By profession, I am a colony surgeon. I 
specialize in correcting nervous disorders of the colony by the 
technique of surgical removal. 


Hofstadter, ‘Gödel, Escher, Bach’ [147] 


Machine learning excels at inducing mappings from data, but struggles to induce 
causal hierarchies. In contrast, symbolic reasoning (in particular, when considered 
as an expression language) can represent any form of domain knowledge and can 
index into code or data via pattern matching.! Evidently, reasoning and learning 
must be robust to both the variability of inputs and the reliability of prior knowl- 
edge, learned or imparted. In that regard, Marcus has argued extensively for neuro- 
symbolic hybrids [210, 213, 214], advocating the complementarity of distributed 
representations (‘neuro’) and qualitative localist causal knowledge (symbolic); see 
also d’ Avila Garcez [59]. We explain in this chapter how SCL defines a framework 
with equivalent goals: although not explicitly ‘neural’ in operation, the dynamic 
attention mechanism can be considered to play an analogous role to that of neural 
connection weights, although perhaps a better guiding metaphor than homogeneous 
neurons is the ‘structural stigmergy’ [179] of ant colonies [70, 146]. We then proceed 
to present a reference system architecture for SCL: its purpose is of course to show 
how SCL can be realized in silico and, by design, to provide sufficient guarantees 
for its claims (open-ended learning, anytime operation, grounded reasoning, etc.) at 
the operational level. 


l Of course, heuristic synthesis of symbolic expressions (e.g. as in Genetic Programming [180]) 
has been practiced for decades, but has never really been considered as ‘mainstream’ ML. 
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8.1 SCL as a Distributed/Localist Hybrid 


Numerous authors have indeed attempted the integration of distributed and localist 
representations for more than twenty years. However, depending on the original 
focus of inquiry (reasoning, control theory, ML, or other), ‘integration’ can serve a 
rather broad variety of purposes. We now list a few exemplars of these to give some 
perspective to the purpose of hybridization in SCL; a broader survey can be found 
in Bharadhwgj et al. [23]. 

The AKIRA hybrid architecture addresses control rather than learning [257] and 
is designed around concurrent code fragments. These fragments are organized in a 
weighted dynamic network of dependencies, competing for budget-constrained com- 
puting power on the basis of their expected outcome (goal satisfaction). Focusing 
on learning instead of control, the DUAL/AMBR architecture [177, 178] controlled 
symbolic reasoning via spreading activation based on numeric truth values across 
a network of causal and structural relations, in a manner reminiscent of Copycat 
[146]. Clarion, another hybrid approach [327, 328], layered symbolic reasoning on 
top of neural networks, with the aim of enabling top-down and bottom-up learning 
processes. These were ultimately based on RL, thus imposing, architecture-wide, 
the fundamental limitations described in previous chapters. In pursuance of another 
objective, the Sigma architecture attempts to define intelligence in a principled man- 
ner [295]: the authors claim ‘grand unification’, and ‘generic cognition’ based on 
graphical models. It is indeed not impossible to think that such a computational 
substrate, pending further significant work, might become the lingua franca of rep- 
resentation processes able to transcend levels of abstraction. However, the authors 
have so far limited themselves to the reimplementation of established concepts such 
as episodic memory [297], RL [296], and a ‘standard’ model of cognition based on 
the ‘sense—think—act’ loop [298]. 

More recently, the majority of ongoing research on hybrids is unsurprisingly ML- 
centric, attempting to remedy some of the inadequacies of deep neural networks 
for reasoning applications. For example, some recurrent neural networks such as 
LSTM, although designed for sequential/temporal inference, face difficulties with 
learning long-term dependencies, mainly because their memory consists essentially 
of compressed (and degradable) input sequences. The Differentiable Neural Com- 
puter (DNC) [124] alleviates this problem by coupling the network to an external 
symbolic memory (other work uses different types of memory augmentation [7, 166, 
260]), but does so at the cost of magnifying other shortcomings of DL: the DNC is 
notoriously harder to train than LSTM (sample inefficiency), and its applicability to 
arbitrary tasks appears to depend even more than before on the architecture employed 
and its initial choice of parameters (brittleness). Several improvements have since 
been proposed [55, 92, 248] and have addressed some of the performance issues 
but less so the issue of applicability. As of today, the DNC performs reasonably 
well on latching, copying, associative recall, and simple state machines; the general 
framework of differentiable computing is preserved but the capabilities of the DNC 
remain very remote compared to the requirements for general reasoning (abstracting, 
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analogy making, planning, etc.). Other approaches have made the opposite trade-off, 
namely that of sacrificing end-to-end differentiability to accommodate more power- 
ful reasoning substrates. They proceed by inserting hand-crafted concepts directly 
into the RL framework in various forms, e.g. conceptual graphs [152], algorithmic 
primitives [138, 193], Drescher-style schemata [71, 164], or ontologies and associ- 
ated auto-encoders [105]. In all of these cases, the agent learns a concept’s extension 
and operational semantics, thus enabling planning via various forms of probabilistic 
inference. However, as recent research shows, hand-crafting of explicit knowledge 
structures can be entirely avoided in some cases (so far, mostly problems that can be 
addressed by direct visual attention). Notably, the PrediNet neural architecture [311], 
when subjected to an appropriate curriculum, is capable of acquiring some represen- 
tations with explicit relational and propositional structure. With PrediNet, the neural 
substrate is used only for learning and does not commit to any specific form of knowl- 
edge exploitation: this can be carried out either symbolically (e.g. using predicate 
calculus, temporal logic, etc.) or, more speculatively, via differentiable ‘reasoning 
models’ [69, 225, 294]. 

These attempts at ‘getting the best of symbolic AI and ML’ proceed quite literally, 
essentially reimplementing reasoning (and the necessary supporting representations) 
in the terms and components of machine learning. We proceed differently. Three 
aspects of SCL are of particular relevance to what is expected from the reconciliation 
of ML and symbolic AI: 


e Strong typing. 
e Fine-grained, open-ended, continual, and compositional inference. 
e Emergent resource-aware and goal-directed attention. 


We claim that these principles combine the strengths of both approaches: whilst 
SCL can be provided with prior domain knowledge in any desired form, it is not 
subject to the problems which plagued GOFAI, since the ability to reflectively rea- 
son at the type level allows the sustained and progressive learning of invariants from 
the environment. In SCL, learning and planning do not need to be reconciled. By 
design, both learning and planning are side-effects of the same reasoning process 
which unfolds in response to the pressure of external goals and constraints over lim- 
ited knowledge and resources (e.g. inputs, time, computational power and memory, 
energy, physical agency, etc.). The research cited above pursues the objective of end- 
to-end representability and actionability of structured knowledge; to this, we add 
the (orthogonal) requirement of end-to-end controllability, i.e., self-referential goal- 
directed resource allocation. In the SCL framework, the duality between distributed 
and local does not concern the representation of functional knowledge (world mod- 
els, goals, etc. are already unified), but rather the representation of cognition itself. 
For reasons explained below, we use distributed representations for controlling the 
reasoning process, itself explicitly represented in the corpus of world knowledge 
to describe the state of the “system-in-the-world’ (the ‘endogenous situatedness’ 
of Prop-3 in Sect.7.3); Wang’s Non-Axiomatic Reasoning System (NARS) [358] 
follows a similar approach. 
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For open-ended learning, inputs may originate from outside the distribution from 
which knowledge was initially learned. Yet progress must nonetheless be sustained, 
both in terms of continual learning and action. ‘Falling outside the system’s comfort 
zone’ is not sufficient reason for invalidating acquired knowledge. This would amount 
to “stop, erase, and re-learn’, similar to the RL ‘learn-then-deploy’ procedure that, 
as we have seen, runs counter the desired property of anytime operation. In fact, 
keeping going might just be the right thing to do: in case foreign inputs happen to be 
in a semantic continuum with prior distributions, extrapolation would be correct, the 
system’s activity would carry on while vindicating its knowledge over increasingly 
broad scopes. Of course, in case of a semantic discontinuity, prior knowledge would 
produce faulty extrapolations and the system might fail to meet its goals. In that 
respect, the system has to perform two activities, continually and on a case-by-case 
basis (possibly concurrently). The first consists of extending an initial distribution 
with foreign inputs and vindicating the current related knowledge. The second is to 
learn new knowledge from a seemingly novel distribution—the system must also 
surmise explicit constraints on the relevance of the initial one to guard it against 
further unwarranted use. This can be achieved by assessing the degree to which the 
learned patterns match the new inputs and propagating the discrepancies across the 
possible consequences at higher levels of abstraction in the knowledge hierarchy—for 
these are ‘battle hardened’ oracles: by construction, they are broader, more reliable 
and, critically, change less frequently than the lower layers of the hierarchy. 

SCL accommodates control heuristics for which truth values are not axiomatic 
but instead are assessed up to a certain degree (certainty is asymptotic, a tenet we 
share with Wang [356]). In this approach, truth values are not static, they unfold over 
time: they are multiplied at each step of inferencing and are also updated whenever 
new (counter-)evidences become known [241]. At the conceptual level, dynamic 
numeric truth values thus reflect, quantitatively, inference composition. Interpreted, 
at the operational level, as the reliability of inference chains, truth values allow to 
compute the worthiness of resource expenditure (i.e., for further ramifying inference 
chains) with regards to the goals at hand. This forms the substrate from which goal- 
directed attention emerges to control the reasoning process, as we will see in the next 
section. 


8.2 Reference Architecture 


The ultimate purpose of SCL is to achieve end-to-end consistency of the feedback 
loops.” This is enabled by endogenous situatedness as per Prop-3 of Sect. 7.3 which 
(1) preserves, anytime, both relevance and correctness, despite the cognitive load 
possibly overwhelming the inevitably limited computational resources, while (2) 


2 Feedback loops are manifested both at the macro-scale, coupling system and environment, and at 
the micro-scale (i.e. intra-system), coupling inference chains—formally, the composition of lenses, 
defined in Sect. 9.4. 
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Fig. 8.1 SCL reference system architecture for end-to-end semantic closure. The executive orches- 
trates a number of asynchronous processes (in black) and provides interfaces to external sensing/ 
actuation data streams (also asynchronous). Scheduling is a resource-aware goal-directed emergent 
process. Rules and states in grey denote axioms. The picture does not distinguish between state 
modalities—past, present, assumed, predicted, or desired. States in blue pertain to the world, states 
in red reflect the reasoning process, and states of the executive (memory usage, energy consumption, 
etc.) are represented in black. The write operation performed by actuators on the workspace carry 
efference copies; see text for details 


deriving such adaptive behavior from the goals at hand, in light of (3) explicit repre- 
sentations of the physical and cognitive state of the system. 

Figure 8.1 gives an overview of a reference system architecture. The architecture 
consists of two main subsystems: a workspace and an executive. The workspace con- 
tains learned relational knowledge (implemented by rules) and a representation of 
both world and system states—be they actual (past or present), hypothetical (assumed 
or predicted), or desired (goals and constraints). Regarding ‘relational knowledge’, 
we move away from the vocabulary of causality, which is framed in terms of ele- 
mentary changes to the ‘wiring diagram’ of hypothesis chains, as per Pearl’s ‘do 
operator’. Instead, in common with the emerging discipline of behavioural control 
[365] (see Sect. 11.2.3), we adopt the relational perspective, which is concerned 
with the invariants propagated by a relation, as per the notion of contract [245]. This 
offers a more prescriptive framework that is therefore better suited to describing 
state spaces, particularly those where rules are partially instantiated (e.g. input terms 
contain wildcards or constraints)—see Sect. 10.1. 

The consistency of the workspace is maintained by the executive which, besides 
providing interfaces to sensors and actuators, consists essentially of a scheduler and 
multiple instances of a universal interpreter. In broad terms, the architecture can 
be considered a massive fine-grained production system [288, 300], in which huge 
numbers of rules fire concurrently and asynchronously. Some small fraction of the 
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workspace rules may be provided a priori while the vast majority are produced and 
maintained dynamically. In addition to rules, the workspace contains states—axioms 
are also accommodated. Each inferred state has an associated time horizon and an 
estimate of its likelihood. States qualify both the world, the deliberations of the 
system (‘trains of thought’), and the system itself embodied in the world (manifested 
as memory expenditure, performance profile, physical agency and integrity, etc.). 

Sensors and actuators constitute the physical interface of the system to the world, 
and as such, must be amenable to modeling. For this reason, actuators write efference 
copies in the workspace. An efference copy is an explicit state which encodes the 
action that was actually executed (by the ‘body’) in response of the request (by 
the ‘mind’) for a desired action: the error between the two allows the system to 
model the contingencies of its embodiment, i.e., capabilities, limitations and costs 
(including response latency, energy expenditure, wear and tear, etc.) in a feedback 
loop generalizing the biologically plausible model of motor control proposed by 
Wolpert [136]. Errors are useful to learn ‘body models’ at the edge—’inside-out’—, 
but not only: there is an inherent dual (and equally important) way to make sense 
of errors. When body models are well established (i.e., vindicated by significant 
experience), errors change their meaning: they then signify hidden world states. 
To take the example used by Wolpert, assuming a carton of milk is full, one will 
exert a rather strong torque on one’s limbs to lift it, only to overshoot the intended 
end location in case the carton turns out to be empty. In other words, a misaligned 
efference copy defeats an invalid assumption (a case of abduction). The symmetric 
feedback loop (i.e. the one pertaining to sensors) is similar, the main distinction being 
that the inherent duality of errors is operationally mirrored. Sensing errors are signals 
for learning world models at the edge (‘outside-in’) in response to the invalidation of 
predictions (a case of deduction). Conversely, when the stationarity of world models 
is well established (i.e. the reliability of world models is predicted reliably, vindicated 
by significant experience) sensing errors also change their meaning: they then signify 
the failure of the sensing apparatus. 

Each rule acts as match-and-transform schema for typed expressions, encoding 
patterns that denote subregions S1, S2 of a state space S, together with a transfer 
function t : Sı — S2. The state space is dynamic: additional dimensions may be 
synthesized (e.g. via abstraction) at any time by some rules and subsequently also 
be available for matching and transformation; least recently significant dimensions 
may conversely be deleted. Matching binds values (states, rules or transformation 
events) to rules, whereas transforming combines several such bindings to produce 
new values (along with a transformation event as a side effect). The allocation of 
resources for transforming available bindings is prioritized by the scheduler; the most 
promising bindings will receive computational attention first. 

Matching and transformation of SCL expressions is performed via instantiations 
of a universal interpreter. Paths through the state space are composed via the appli- 
cation of interpreter instances and compound SCL inference methods (as introduced 
in Chap.7 and discussed in further detail in Sects.9.2—9.4) to produce inferences 
(deductive, abductive, or analogical) and state-space dimensions. This interpretation 
is explicitly compositional and imposes a denotational semantics on expressions. 
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The unfolding operation of universal interpreter application bears witness to the sys- 
tem’s internal state of deliberation: such ‘trains of thought’ are considered first-class 
citizens and are duly recorded in the workspace. 

The learned ruleset constitutes a fine-grained representation of the entire transition 
function of the world model. In contrast, axiomatic rules have a different purpose: to 
implement heuristics for rule and type construction and maintenance. For example, 
some rules seek to “carve Nature at its joints” [127], by identifying ‘surprising’ 
world states such as the failure of a prediction or the unpredicted achievement of 
a goal. In either case, corrective rule synthesis is applied to the ruleset: this can 
be considered to impose learned pragmatics on the default denotational semantics. 
Conversely, the absence of surprise vindicates parts of the world model and the 
rules that implement them: these rules will be deemed more reliable than others 
that see repeated failures. For completeness, other axiomatic rules include seeking 
out candidate patterns as a basis for constructing abstractions and analogies, as well 
as identifying opportunities for generating improved hypotheses. Rule synthesis is 
addressed extensively in Chap. 10. 

Computing resources are inevitably limited and a system must allocate these 
wisely to cater to arbitrary influxes of inputs and goals. The reference architecture 
decouples matching from transformation and treats them accordingly as two dis- 
tinct classes of processes. Matching processes produces requests for transformation, 
termed ‘jobs’, each of these being associated with a quantitative estimate of its utility 
with respect to all goals at hand, system-wide. This estimate is based on three main 
factors: (1) the likelihood of the matched value, which is a function of the reliability 
of the (chain of) rules that produced it; (2) the reliability of the matching rule; and 
(3) the relative importance of the goals that may be served (or opposed) by the result 
of the job. This is revised continually as premises, rules, and goals vary not only 
in quality (the geometry of the subregions they occupy in the state space) but also 
quantitatively as per their reliability and relevance. Estimates of utility are primar- 
ily used for prioritizing jobs for execution, i.e. in fine, to schedule the execution of 
transformations competing for resources. 

The role of the scheduler in the SCL architecture is merely to re-order jobs contin- 
ually according to their priorities. For example, previously unattended jobs may jump 
ahead of new ones as their prospective benefits become apparent, whereas hopelessly 
unpromising jobs are eventually deleted. This particular scheduling design confers 
three essential benefits. First of all, it avoids the scalability issues inherent in stan- 
dard job-scheduling theory. The scheduler does not perform explicit allocation but 
instead slows down relatively less beneficial inference chains, exploiting the fine 
granularity thereof to its full extent. In the standard acceptance of the term, this is 
hardly a scheduler at all: SCL scheduling is indeed more a distributed emergent pro- 
cess than it is an algorithm. Second, this particular design enforces a fundamental 
property of the system architecture, namely endogenous situatedness. It achieves this 
by imposing a semantically-grounded control over the execution of the expression 
transformers: control is based on up-to-date predictions of functional costs and bene- 
fits, and balances the cognitive load for the best use of time (deadlines) and available 
(possibly varying) computing power. Last but not least, semantically-grounded con- 
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trol implements an implicit attentional process: centralized attention algorithms are 
notorious not only for hindering scalability but also for opposing anytime responsive- 
ness. Instead, an SCL system keeps the cost of attention constant by amortizing the 
computation of scheduling priorities over the continual transformations of expres- 
sions. 

The embodiment of the system also imposes limits on the size of its workspace. 
Although the reference architecture does not impose any specific heuristics to keep 
the growth of the workspace in check, we mention here a few possibilities. A rule 
can be evicted on the basis of its reliability: a general trend downwards indicates 
irrelevance and warrants deletion. There are also other reasons why an otherwise 
reliable rule can become irrelevant, for example, a temporary change of the stationary 
regime of the environment or a change in the agent’s mission. In such cases, deletion 
should be avoided (the system may need to revert to previous conditions) and the 
incriminated rule shall instead be swapped out to some larger auxiliary storage at 
the cost of incurring extra latency upon its recall (swap in). Whenever a chain of 
abduction halts before reaching its goal a swap-in request is issued, taking the missing 
rule signature as its argument. In case no rule is returned, then the system would have 
found a ‘gap’ in the world model structure and trigger learning. Inferred states, as 
we have seen, are qualified by their likelihood; they can be furthermore qualified by 
their utility: the utility of a prediction increases when its target state matches that of a 
goal and vice-versa. Low values for both the likelihood and utility can meaningfully 
trigger the eviction of an inference. Finally, possible heuristics for evicting observed 
states from the workspace include LRU (least recently used), age, and number of 
references. 
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A couple in love walking along the banks of the Seine are, in real 
fact, a couple in love walking along the banks of the Seine, not 
mere particles in motion. 


Stewart A. Kauffman [168] 


The system architecture presented in Chap. 8 controls (i.e., sustains and constrains) 
the invocation of the inference methods introduced in Chap. 7. In this chapter, we 
describe the methods of higher level inference in more detail. There is increasing 
conviction that methods of category theory are well-suited for providing generic 
descriptions for cognitive science, general intelligence [258], and control [42], the 
latter being newly christened as ‘categorical cybernetics’. In this chapter, we describe 
how to leverage the power of selected category-theoretic constructions to realize SCL 
operations in a compositional manner. 


9.1 Categorical Cybernetics 


Category theory has served as a formidable unifying mechanism in mathematics. 
It was devised in the 1940s by Eilenberg and MacLane [209] in order to provide a 
higher-order vocabulary for algebraic topology and defines a principled setting for 
the study of structurally-informed transformations. Amongst applied category theo- 
rists there is increasing interest in machine learning applications, and it is becoming 
apparent (e.g. [78, 90, 318]) that much can be done to unify and generalize exist- 
ing methods. Following a brief overview of category theory, we describe how SCL 
operations may be implemented in terms of specific category-theoretic construc- 
tions. Below, we describe the essential concepts; more detail is available in various 
excellent texts (e.g. [196, 264, 290]). Mathematical approaches to formalizing com- 
positionality can be broadly divided into ‘syntactic’ and ‘semantic’. While syntactic 
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approaches such as those in linguistics [361] and process algebra [224] might be 
better known, the semantic approach via category theory has become increasingly 
popular in diverse fields due to its flexibility and mathematical elegance [89]. 

A category is a two-sorted algebraic structure rather like a graph, consisting of 
‘objects’ and ‘morphisms’: every morphism has a source object and a target object, 
written f : X — Y. In addition to the graph-like structure is a way to compose 
morphisms: given any morphisms f : X — Y andg: Y —> Z with the target of one 
agreeing with the source of the other, there is a composite morphism fg: X > Z 
(typically written g o f). This must satisfy the ‘associativity property’ familiar from 
algebraic structures such as groups, i.e., given three composable morphisms f, g, h 
the two different ways of composing them must agree: (f g)h = f (gh). Every object 
must also have an assigned ‘identity morphism’ 1x : X — X, and they must act as 
the identity element for composition: ly f = f = fly forall f : X > Y. 

There are many examples of categories, and we will name only a handful: 


pà 


The category Set of sets, whose objects are sets and morphisms are functions. 

2. The category FinVec of finite-dimensional vector spaces, whose objects are finite 
dimensional vector spaces and morphisms are linear maps. 

3. The category Rel of relations, whose objects are sets and morphisms are binary 
relations. 

4. For any graph G, there is the ‘free category on G’, whose objects are nodes of 
G and morphisms are paths in G. Composition is concatenation of paths, and 
identity morphisms are paths of length zero. 

5. Any monoid M can be viewed as a category with a single object x, where every 

element m € M is viewed as a morphism m : x —> *. 


Of particular relevance is the fact that typed programming languages (e.g. the 
simply-typed lambda-calculus [47] or intuitionistic type theory [217]) give rise to 
corresponding categories: it is possible! to consider a category with types as objects 
and functions as morphisms [54]. 

When studying compositionality, it is typical to work in the context of ‘monoidal 
categories’, which have additional structure: there is a monoid-like structure on 
objects, with a binary operation ® and a unit Z, in a way that is also compatible 
with morphisms, so if f : X; —> Yı and g : X2 — Y are morphisms, then so is 
f @g:X,@ Xr —> Yı 8 Yo, satisfying various laws. A category typically has sev- 
eral different monoidal structures. For example, in the category of sets we could take 
® to be Cartesian product of sets (whose unit is a 1-element set) or disjoint union of 
sets (whose unit is the empty set). In the category of finite-dimensional vector spaces 
we could take © to be the direct product (whose unit is the 0-dimensional space) or 
the tensor product (whose unit is the 1-dimensional space). 

Morphisms in a monoidal category are often represented using the graphical nota- 
tion of string diagrams [216]. For example, if we have morphisms f : X; > Yı, 
g : X2 > Y,andh: Y; Q Y2 — Z, then the composite morphism (f ® g)h : X1 ® 
X2 — Z is represented by the diagram: 


! Modulo certain technical considerations. 
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Yı 


Xı f 


Xə g 


Yo 


The other basic concept required is a ‘functor’, which is a structure-preserving 
map between categories. If C and D are categories than a functor F : C —> D is an 
assignment sending every object X in C to an object F(X) in D, and every morphism 
f: X — Y inC toamorphism F(f) : F(X) —> F(Y) in D, ina way that preserves 
identities and composition. If our categories are monoidal then we consider ‘monoidal 
functors’, which also preserve the monoidal structure. 


9.2 Hypothesis Generation 


One of the most important requirements for general intelligence is that the hypotheses 
which are generated are salient, i.e., pertinent to the task at hand. In the case of the 
simple pendulum, the closed form expression was obtained by virtue of observation 
that e.g. the color of the pendulum bob was not relevant, but the angle of displacement 
was, and so on. However, this does not imply that all features of the pendulum were 
given equal attention. Humans have sufficiently strong priors about force and motion 
that it is hard to imagine an experimenter ever consciously entertaining color as a 
factor. It is therefore evident that scientific hypothesis generation enjoys a degree of 
subtlety which is absent from traditional ML approaches. 

Previous such work on non-quantitative generation of alternative hypotheses can 
be found in Mitchell and Hofstadter’s ‘Copycat’ [146]. Copycat is proposed as a 
cognitively-plausible generator of proportional analogies between letter-strings and 
operates without requiring a ‘top-down, a priori’ objective function. In the abstract, 
Copycat can be considered as an interpreter for expressions that describe (poten- 
tially partially constructed) analogies, in which top-down and bottom-up perspec- 
tives interact. At any point in construction, structures can be interpreted to yield 
what Hofstadter describes as their ‘counterfactual halo’, i.e., to suggest alternative 
expressions that tend to be more self-consistent. Copycat avoids explicit combina- 
torial search via a combination of attention heuristics (which share a great deal of 
commonality with the scheduling mechanism described in Sect. 8.2) and interacting 
hierarchical constraints. Salient actions are indexed ‘on demand’ by local updates to 
hypotheses and no form of global truth maintenance is required. These local updates 
act to greatly prune the space of possible alternatives, being biased by design in the 
general direction of ‘more consistent, hence more compelling’ hierarchies. 

More generally, a number of previous works in cognitive architectures argue that 
the frame problem is an artifact of an overly rigid perspective on hypothesis repre- 
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sentation [97, 146]. Specifically, the claim is that hypotheses should be both causally 
grounded (via participation in a sensor-effector mapping that receives feedback from 
the environment) and ‘reified on demand’ via the data-driven context of task features. 
It is claimed that the arguments of a priori representationalism that lead to the frame 
problem are then no longer applicable. Such context-specific hypothesis-chaining is 
an intrinsic aspect of SCL: salience is facilitated via the joint action of fine-grained 
scheduling and resource bounds on both time and space [325]. In the following 
section, we describe a general mechanism for interpreting the structure of hypothe- 
ses, in which alternative hypotheses are generated via a specific form of interpretation 
which yields a modified hypothesis as a result. 


Compositional Interpretation 


Compositional interpretation of SCL-expressions is achieved via denotational seman- 
tics: the meaning of a composite expression is interpreted as as a function of the 
meaning of its component parts. In a ‘closed-world’ setting, such an interpretation 
is fixed at the point of deployment. In an open world setting, it remains forever 
possible that there are surprising latent interactions between components and the 
interpretation process must therefore incorporate online learning. 

Marcus [210, 214] has previously contrasted deep learning with human capa- 
bilities (as inferred via observations from neuroscience), and notes that the former 
typically lacks generic mechanisms for recursion and variable binding. He proposes a 
hierarchical structured representation (‘treelets’) for addressing this, but leaves open 
the question of how they should be manipulated for efficient inference. As regards 
efficiency, Marcus gives desiderata which are evocative of the ‘active symbols’ of 
Mitchell and Hofstadter’s ‘Fluid Analogies’ architecture [146], the direct analog of 
which in SCL is provided via the matching and transformation processes described 
in the beginning of this chapter. 

It has previously been proposed [259] that the category theoretic mechanism of 
initial F-algebras provides an appropriate and parsimonious means of modeling 
these aspects of human cognition and we describe below a concrete application 
that supports such recursion and variable binding. F-algebras provide a universal 
mechanism for the generic interpretation of expressions. By ‘generic’, we mean that 
a wide class of typed expression languages can be compositionally interpreted. As we 
describe below, ‘universal’ has a specific technical meaning in category theory, but 
can for practical purposes be taken to mean that the interpreter is parameterized by 
the target datatype and can approximate (e.g. via learning [333, 334]) any primitive 
recursive semantic interpretation.” Moreover, the interpreter can both accommodate 
recursive expression languages and be stateful (and hence perform variable-binding, 
so meeting both of Marcus’s requirements, above). 


2 Strictly, the expressiveness is more general than primitive recursive in that it includes e.g. Acker- 
mann’s function [154]. 
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FA—*> FB 
f g 
A—* 48 
Fig. 9.1 h as a homomorphism 


F(Cata f) 
—> 


F (uF) FA 


uF 


Fig. 9.2 Cata as a unique homomorphism 


Technically, the universiality property arises as follows: for category C and functor 
F : C > C, an algebra’ consists of a pair (A, f) consisting of an object A and a 
morphism f : FA — A. A homomorphism h : (A, f) —> (B, g) between algebras 
is a morphism h : A —> B such that the square in Fig. 9.1 commutes. 

In the category that has algebras as objects and homomorphisms as morphisms, 
an initial algebra is an algebra that is unique (up to isomorphism) and initial, i.e., 
it can be transformed into any other algebra in the category. We write (uF, in) for 
an initial algebra, and Cata f for the unique homomorphism h : (uF, in) > (A, f) 
from the initial algebra to any other algebra (A, f). That is, Cata f is defined as the 
unique arrow that makes the diagram of Fig. 9.2 commute. 

The universal interpreter property of Cata then arises by virtue of this initiality 
[154]. Cata is an abbreviation of ‘catamorphism’ (from Greek: kata ‘downwards’ 
and open ‘shape’ ); informally, a generic method of transforming some typed source 
expression into a target expression (of whatever desired type). Hence, in a category 
of expressions Ex in which the objects are types and the morphisms are functions, 
Cata thus provides an algorithm template for the interpretation of expressions. Hence 
predicates on Ex are represented as a transformation of some source expression which 
has target type bool and alternative hypotheses as having target type Ex. 

As previously mentioned, in the wider context of general intelligence, such a 
“Closed World Assumption’ is insufficient: reality may always intrude to defeat prior 
assumptions.* Such exceptions arise as a function of the difference between expected 
and observed states—whether because an action fails to yield the anticipated state, or 
else because some state of interest arises in an unanticipated manner. In the context 
of SCL, then, the term hypothesis recovers its traditional meaning as a tentative 
proposition about reality that remains forever potentially subject to revision. When 


3 This definition subsumes the familiar usage of the term. 
4 As discovered by Bertrand Russell’s unfortunate fictional chicken. 
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the distributed transition function of the world model is determined to be in error 
in this manner, a corresponding ‘repair’ must be applied to the current denotational 
semantics. Those repairs can either widen the constraints or suitably constrain the 
transfer function. 

In SCL, constraints are expressed via the predicate of a refinement type. As types 
are aggregated into composites (via sum and product), so the hierarchical structure of 
predicates becomes more complex. In the particular case of hypothesis generation, the 
catamorphic traversal of expressions accumulates proposed alternatives and performs 
‘conflict resolution’ on them in order to propose a hypothesis that better meets its 
constraints. Any inference mechanism could conceivably be applied in the process, 
depending on the intermediate mappings between types that are required. Fortunately, 
the repair process is considerably facilitated by the granular nature of inference: it is 
typical that repairs require only local changes to inference rules. 

It is also interesting to note the overlap here with the contemporaneous work of 
Goertzel [116], which proposes chronomorphisms as a potential unifying mechanism 
for the OpenCog framework. While we certainly share the belief that the zoo of 
recursion schemes (including anamorphisms, futuromorphisms, etc.) have a role to 
play in open-ended inference, we are not of the opinion that their role lies in solving 
optimization problems, as discussed in Sect. 5.1 on a priori rewards. 


9.3 Abstraction and Analogy 


Consensus on the value of abstraction in AI dates back to the inception of the field 
[220]. Various forms of analogical reasoning have similarly been widely argued, by 
cognitive and computer scientists alike [84, 144], to play a vital role. There are a wide 
variety of proposed definitions for both. For example, Gentner [107] defines abstrac- 
tion as ‘the process of decreasing the specificity of a concept’. Cremonini et al. [53] 
define abstraction as ‘the process of mapping between problem representations, so 
as to simplify reasoning while preserving the essence of the problem’. Definitions 
of abstraction and analogy can overlap considerably. For example, the formal and 
general abstraction framework of Giunchiglia and Walsh [111] describes abstraction 
in terms of properties that are provably preserved under the source to target map- 
ping. This definition could also be said to be applicable to predictive analogy [271], 
which is concerned with inferring properties of a target object as a function of its 
similarity to a source object, the oft-cited example of which is the similarity between 
the solar system and the Rutherford model of the atom. Given the perceived richness 
and complexity of abstraction and analogy, this overlap is unsurprising. Indeed, it 
seems possible that the processes are recursively interleaved in a nontrivial and data- 
driven manner. Hence, whilst in this Section we propose a concrete mechanism for 
abstraction that can then be used as a basis for analogy, this should be considered as 
a pedagogical device rather than any attempt at a definitive statement. 

The interpretation of SCL expressions via catamorphisms already provides an ele- 
mentary abstraction mechanism: the algorithm skeleton for the interpreter abstracts 
over the generation of alternative types and values from a given expression. However, 
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Fig. 9.3 Anti-unification as a categorical product. The anti-unifier of expressions e; and e2 is an 
expression u, together with two substitutions o; and 02. For s —>o t, we say that t is a specialization 
of s, i.e. it has been obtained from s via the instantiation of one or more variables. The least 
general anti-unifier (u, 01, 02) is the unique u such that, for any other candidate (u’, oj, o3), uisa 
specialization of u’ via some substitution o’ 


since expressions in SCL are first-class objects, we may also perform abstraction via 
other means, such as anti-unification. Anti-unification has a variety of uses in program 
analysis, including invariant generation and clone detection [39]. The various forms 
of unification [266] can be described categorically [119] via a category in which 
the objects are terms (i.e. SCL expressions) and the morphisms are substitutions, 
i.e., the binding of one or more variables to subexpressions. Anti-unification is the 
categorical dual of unification [162], representing the factorization of substructure 
common to two expressions. The discovery of such ‘abstract patterns’ is analogous 
to the induction of subroutines, which can be instantiated across a range of parameter 
values. More generally, abstraction is applicable across different dimensions of the 
state space—see Sect. 11.2.2 for a discussion of wider prospects in this regard. 

Figure 9.3 depicts anti-unification as a categorical product, a construction that 
generalizes the familiar notion of Cartesian product. The diagram denotes that anti- 
unifier (u, o1, 02) is more specialized than all other candidates (u’, oj, 05) because 
the latter can be recovered from the former via o’. 

Analogical reasoning in humans appears to afford unbounded creative potential 
[95]. We share the belief that analogy is a dominant mechanism in human cognition 
[148, 228] and envisage that computational models of analogy will be a key research 
area for general intelligence. It should be clear that such research is completely 
open-ended. The categorical approach we describe below is therefore for pedagog- 
ical purposes; the further incorporation of heuristics is a more realistic prospect for 
practical use. 

We give a categorical construction for proportional analogies which builds upon 
the method of abstraction defined above. 

As previously described in Sect. 7.4, the example application domain we con- 
sider here is that of letter-string analogy (LSA) problems (e.g., abc : abd:: ijk 

???). Although the domain may appear simple, it has been remarked that it can 
require considerable sophistication to obtain solutions that appear credible to humans 
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abc = "4% abd 


| vy’ 
a h’ 
ijk ——~—> ??? 


Fig. 9.4 Proportional analogy as a commutative diagram: abc : abd:: ijk : ??? 


[95], not least because the domain is not readily modeled as an optimization problem. 
It is therefore reasonable to assume that mechanisms for solving LSA problems have 
high relevance and are applicable for implementing cognitive processes at many lev- 
els. Notable approaches to LSA problems include Hofstadter and Mitchell’s Copycat 
[146] and the ‘E-generalization’ approach of Weller and Schmid [360], although the 
latter is not cognitively plausible for reasons of scale. 

As can be seen in Fig. 9.4, proportional analogy problems can also be consid- 
ered to form a commutative diagram. The ‘abstraction via anti-unification’ approach 
described above can be used as a building block for constructing analogies, for exam- 
ple as is done in ‘Heuristic Driven Theory Projection’ [308]. In particular, abstraction 
can be combined with the powerful category theoretic constructions of pushouts and 
pullbacks to construct, express, and understand such analogies in a computation- 
ally automatable manner across a wide range of expression languages. Specifically, 
we can use these constructions to determine the possible relationships between our 
objects A = abc, B = abd and C = ijk such that D = ij1 is uniquely deter- 
mined through commutative diagrams.° 


Pushouts 


A pushout may be understood as an ‘abstract gluing’ of a pair of morphisms 
b:A— B and c : A — C. The construction of a pushout involves the construc- 
tion of some fourth object D alongside morphisms c’ : B > D and b’: C > D 
such that the resulting diagram commutes, i.e., c’(b(A)) = b’(c(A)) S D, where = 
denotes equality up to isomorphism. Further, the resultant commuting diagram must 
satisfy the ‘pushout condition’, that for all D’ withe : B —> D’ and f : C > D’and 
e(b(A)) = f(c(A)) = D’, there exists a unique morphism d : D —> D’ such that 
d(c'(B)) = d(b' (C)) = e(B) = f(C) = D’. 

The meaning of these conditions will become immediately clear through an exam- 
ple of a pushout in the category of sets (typically denoted Set). In Set, objects are 
sets and morphisms are functions. Below we given an example of a pushout in Set, 
with objects A = {a, b, c}, B = {1, 2,3, 4} and C = {W, X, Y, Z}, and morphisms 
h = {a —> 1,b > 2, c > 3} and v = {a —> X,b > Y,c — Z}. For each object in 
B, C and D, we denote its pre-image in A in parenthesis. 


5 The possibility D = ijd is not treated here; a more fully-featured approach to analogy would 
include nondeterminism and preference heuristics, e.g. as in Goguen’s ‘sesqui-categories’ [117]. 
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A= {a,b,c} i > B = {1(a), 2(b), 3(c), 4} 


v v’ 


C = {W, X (a), Y ®), Z(0} — D = {W, 1X (a), 2Y (b), 3Z (c), 4} 


The resultant object D = {W, 1X, 2Y, 3Z, 4} with morphisms v’ = {1 > 1X, 
2 — 2Y,3 > 3Z,4 —> 4} and h’ = {X > 1X, Y > 2Y, Z > 3Z, W > W} are 
the unique pushout of morphisms / and v. 

The uniqueness is a result of the pushout condition; if we instead had D’ = 
{1X, 2Y,3Z, 4W} and v'(4) = 4W and h’(W) = 4W then we would have elements 
in B and C, with no common pre-image in A, being mapped to the same element 
in D’. As a result, there would be no morphism d : D’ —> D, such that the resultant 
diagram commutes as the offending element 4W would need to be mapped to two 
separate elements in D, 4 and W. Thus we can see that the pushout condition prevents 
‘confusion’ such that, if an element in D has pre-images in both B and C, then those 
pre-images must themselves have a common pre-image in A. 

Similarly, if we have D’ = {W, 1X, 2Y, 3Z, 4, Q}, such that the element Q has no 
pre-image in B or C, then it may be mapped to any element in D by some morphism 
d' : D' > D, such that d’ is no longer unique. Thus we can see that the pushout 
condition prevents ‘junk’ such that all elements in D must have a pre-image in at 
least one of B or C. A common phrase describing this property is that v’ and h’ are 
jointly surjective. 

Another relevant concept related to pushouts is that of pushout complements. 
A pushout complement is the completion of a pair of arrows h: A —> B and 
v’: B > D toa pushout, and is given by an object C with morphisms v : A > C 
and h’ : C + D such that the resulting diagram of A, h’, v and v’ is a pushout. 


Pullbacks 


If pushouts are ‘abstract gluings’ of morphisms, then ‘pullbacks’ may be understood 
as ‘abstract intersections’ of morphisms. A pullback is given over a pair of morphisms 
h: B > D and v : C —> D. The construction of the pullback of h and v yields a 
fourth object A with morphisms v’ : A > B and h’ : A > C such that the resulting 
diagram commutes and satisfies the ‘pullback’ condition. This condition requires that 
for all A’ with e : A’ > B and f : B’ > C and e(h(A’)) = f(v(A’)) & D, there 
exists a unique morphism a : A’ —> A such that v’(a(A’)) = B and h'(a(A’)) = C. 


Analogy via Pushouts, Pullbacks, and Pushout Complements 


We now demonstrate that these concepts give mechanisms for automatic derivation 
of analogies according to abstractions. For a more complete description of pushouts, 
pullbacks, and pushout complements in this context, see Taentzer et al. [76]. In 
Fig. 9.5, we give a sketch of a pushout-based solution to classic letter-string analogy 
problems. Letter-strings are represented as lists of natural numbers, with natural 
numbers themselves represented via Peano arithmetic. We are working in the category 
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Fig. 9.5 A solution to the letter-string analogy question abc : abd::ijk : ??? viapushouts. 
Each letter string is represented by its corresponding tree in Peano arithmetic, with the letter ‘a’ 
corresponding to the number ‘0’. The shorthand ‘S*’ is used to concisely represent a sequence of x 
consecutive successor nodes. Elements in red are ‘deleted’ by the transformation, whereas elements 
in blue are ‘created’. The middle rule can be viewed as an abstract rule for transforming letter-strings, 
and interpreted in language as ‘increment the final (non-a) character in the letter-string’ 


of labeled graphs (an ‘adhesive category’ [185]), along injective morphisms which 
are both label- and structure-preserving. Note that in this scenario, relabeling may 
be achieved in rewriting systems over labeled objects through the use of partially 
labeled objects as intermediaries [126]. 

The example works as follows. We are provided with the expression trees of 
the letter strings abc, abd, and ijk. We know that abc relates to abd in some 
manner and wish to induce the equivalent relation for ij k and some unknown letter 
string. Thus the task may be phrased “abc is to abd as ijk is to what?”. The 
first step is to induce common substructures between the pairs abc and abd as 
well as abc and ijk. Two particularly promising substructures are shown in the 
top middle and left middle of the diagram. This first step is most critical; the two 
common substructures will deterministically define the rest of the transformation. 
If either common substructure is too specific, it may form too rigid a restriction for 
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the rest of the process to be successful. Alternatively, if the common substructures 
are too general, ambiguity may occur causing there to be several possible analogous 
letter-strings. 

Given the two common substructures, a pullback may be constructed, yielding the 
graph in the center of the diagram. This may be understood as the common elements 
of both common substructures such that the top-left square commutes. With the 
central element, we may then construct the pushout complements in the middle-right 
and bottom-middle spaces, and finally the pushout of those elements with the central 
element to give the final analogous graph: ij 1. A remarkable property of this process 
is that it actually yields an abstract rule L < K — R (the elements of the middle 
row), which may then be applied to any other letter-string expression T according 
to the following process: 


1. Find a graph morphism f : L > T. 

2. Construct the pushout complement of f and L <— K to give intermediary D with 
morphisms t: D > T and g : K > D. 

3. Construct the pushout of g and K — R, giving result expression S. 


This process is only applicable if there exists a graph morphism f : L —> T, and 
this is only the case when the letter-string expression T is at least 3 letters long and its 
final letter is not a. Hence the rule L < K — R may be interpreted in language as; 
“increment the final (non-a) character in the letter-string’. When applied to the letter- 
strings ddd and mmj j kk it gives the unique results dde and mmj j k1 respectively. 

Although LSA is given as an example domain, the treatment is a general one 
for adhesive categories [185]. This means that the principles described above can 
be leveraged to induce analogies over labeled graphs [76] and their derivatives (e.g. 
forests and trees), hierarchical graphs [72], and port graphs [85, 267]. 


9.4 Abduction 


There has recently been much interest in the applied category theory community in 
‘lenses’, which provide a simple but powerful abstraction for hierarchical feedback. 
Originally appearing in database theory and functional programming for describing 
deeply nested destructive updates [91], they have later turned out to be of central 
interest in categorical approaches to both game theory [108] and machine learning 
[88, 90]. One perspective on lenses is that they are a general theory of ‘things that 
compose by a chain rule’. 


Lenses 


Lenses are the morphisms of a category Lens whose objects are pairs of sets, where 
we think of the first as ‘forwards’ and the second as ‘backwards’. A lens 


à: (X+, X`) > (Y+, Y`) 
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is a pair consisting of a ‘forwards’ function: 
At Xt > yt 


and a ‘backwards’ function: 


A7:XtxY > Xx 


If we have another lens u : (Yt, Y~) —> (Zt, Z`) then the composite lens has 
forwards function: 


Au) E) = u At (a) 
while the backwards function is given by the characteristic law 


AWT (x, z) = A7 (x, ew (AT (x), z)) 


The category of lenses has a monoidal structure, which is given on objects by: 


(X+, X) @ (Yt, YT) = (Xt x Y+, XT x Y7) 
and on morphisms by: 


A D W), x2) = At (a1), War) 
Q 8 u) (G1, x2), O1; Y2)) = A (1, y1), W Q2, y2)) 


While this definition is written in terms of sets and functions, it turns out that lenses 
can be more generally defined over (essentially) any monoidal category, although 
the correct general definition is not obvious [291]. 


Examples in Machine Learning 


Backpropagation. An important class of lenses consist of a function paired with its 
first derivative, which is usually known as reverse-mode automatic differentiation. 
Specifically, let Smth be the category whose objects are Euclidean spaces and whose 
morphisms are differentiable functions. There is a functor D : Smth —> Lens which 
is given on objects by D(R”) = (R”, R”) where we think of the second R” as the 
cotangent space at a point in the first. On morphisms the functor is given by: 


D(f) = (f, f*), where f*(x, v) = J¢(x)'v 


where Jp is the Jacobian of f. In order for this to be a functor it must be the case 
that the derivate of a composite function fg can be determined from the derivatives 
of f and g using the lens composition law. This turns out to be essentially the chain 
rule: Given composable smooth functions f, g, we have 
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(D(f)D(g)) (x, v) = f* (x, g* (fœ), v)) (lens composition law) 
= Ip (x) IFE" 


= [J,(f(x)) Jp @)] vu (law of matrix transpose) 


t 


= dJi (x)'v (multivariate chain rule) 


= (fg) x, v) 


From this it follows that D(f)D(g) = D(f 9), i.e., that D is a functor. This connec- 
tion between lenses and the chain rule was explicitly observed by Zucker [376] and 
is also implicit in Fong et al. [90]. 


Variational inference. There is a category whose morphisms X — Y are probabil- 
ity distributions on Y conditional on X. There are several different ways to make 
this precise, for example to say that X, Y are measurable spaces and the condi- 
tional distribution is modeled as a measurable function X —> G (Y), where G(Y) is 
the measurable space of all probability measures on Y [98, 110]. Equivalently, one 
might say that objects are finite sets and morphisms are stochastic matrices. Com- 
position of morphisms is by ‘integrating out’ the middle variable (sometimes called 
the Chapman-Kolmogorov equation [250]), which is simply matrix multiplication 
in the finite case. Call this category Stoch. There is a morphism Stoch — Lens that 
pairs a conditional distribution with the function that performs Bayesian inversion 
on it, namely f + (f, f*) where f* : G(X) x Y > G(X) returns the posterior dis- 
tribution f*(z, y) given a prior x and an observation y. Bayesian inversion satisfies 
a ‘chain rule’ with respect to composition, meaning that the Bayesian inverse of a 
composite conditional distribution can be computed in terms of the Bayesian inverses 
of the components, and this fact precisely says that Stoch — Lens is a functor [318]. 


Dynamic programming. Consider a Markov chain with state space S, action space 
A, and (stochastic) transition function P : S x A — S. Suppose further that actions 
are controlled by an agent, who obtains utility U : § x A —> Roneach transition. For 
each policy x : S —> A we obtain a function f : S > S given by f(s) = P(s, m(s)), 
and a function f* : S x R > R given by f*(s, c) = U (s, m(s)) + yc, where 0 < 
y < lisa fixed discount factor. The second input to f* is known as the continuation 
payoff. These two functions constitute a lens 4, : (S, R) —> (S, R), indexed by the 
policy. On the other hand, a lens V : (S, R) —> (1, 1) turns out to be just a function 
V : S —> R, which we take to be the value function. If V is an initial value function 
and v is the appropriately optimal policy for it, the lens composition àx V : (S, R) > 
(1, 1) performs a single stage of value function iteration. Thus value function iteration 
amounts to approximating the limit: 


Aa Any 0 
ih — (S, R) Š (SR) — (s, 6 1,1) 


where each 7; is the optimal policy for the current value function at each stage.° 


6 This connection between dynamic programming and lenses is due to Viktor Winschel. 
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Hierarchical Symbolic-Numeric Lenses 


All three examples of the ‘lens pattern’ we have described above for machine learning 
are notably ‘low-level’ and numerical. However, now that the common pattern has 
been identified, it is possible in principle to design systems which are structurally the 
same but which are semantically ‘higher-level’. This allows the best of both worlds: 
logical languages embodying GOFAI principles such as abduction and generaliza- 
tion can be combined with the hierarchical feedback which has been enormously 
successful in numerical and connectionist approaches. One option is to construct a 
monoidal functor C —> Lens where C is a suitable category for higher-level reason- 
ing; another is to build additional structure into Lens itself using its more general 
definition. 

The specific approach proposed here is to construct lenses in which the forwards 
map performs deductive reasoning, and the backwards map performs abductive rea- 
soning. The idea is that the forwards map At will, given a hypothesis x, generate a 
deductive conclusion At (x), while the backwards map will, given an initial hypoth- 
esis x and an observation y, abductively generate an updated hypothesis A~ (x, y) in 
order to explain the observation in a way that is in some sense ‘as close as possible’ 
to the starting hypothesis. 

Suppose now that from an initial hypothesis x we make a 2-step deduction 
u” (At (x)). If we then observe z, we can perform a 2-step abduction using the lens 
composition law to determine a new hypothesis. First, using the deduced hypothesis 
A* (x) and the observation z, we use u` to abductively determine the new ‘middle’ 
hypothesis u~ (At (x), z). We then treat this as though it is an observation, which 
together with the initial hypothesis x abductively determines the final result: 


AW (x, WAT (x), 2)) 


Another possibility is that forwards maps perform abstraction between different 
levels of representation of a state, and backwards maps are control commands (or 
desires). We will illustrate this with a simple worked example. Consider a factory 
robot moving in the region: 


R={(x,y)€R |0<x < 10,0 < y< 10} 

Since we will only be describing the robot’s movement with pseudocode it suffices 
to informally describe the map. The factory floor is divided into two zones, with a path 
running through both. In each zone there is a goal adjacent to the path, representing 
a place where the robot can pick up or deliver objects. An abstracted description of 
the robot’s position is given by elements of Zone x Chunk, where 


Zone = {zone1, zone2} 


is the set of zones and 
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Chunk = {path, goal, nothing} 


There is a function Aikos : R > Zone x Chunk that abstracts the robot’s position. 
Going the other way, a command to move to a certain state at the more abstract level 
can be ‘translated down’ into a lower-level command to move in the space R. For 
this we also need to know the current position in the concrete space R, making the 
type of the backwards function: 


Apos © R x (Chunk x Zone) > R 


Anos Need not move the robot directly to a position that satisfies the goal, that is, the 


equation (A ee (x, g)) = g (known as the ‘put-get law’ of lenses) need not always 
hold. Rather, Apos can direct the robot through a series of ‘waypoints’ by treating 
the position variable as a state variable. What should be guaranteed is that holding 
the goal fixed and iterating 4,,.(—, g) : R > R from any starting position will after 
finitely many steps reach a position x € R satisfying Ahs) = g (provided the goal 
is physically reachable for the robot). For example, our Apos (x, g) could be given by 


the following pseudocode: 


+ 


e Ifthe current position satisfies the goal (Apos 


x). 

e Otherwise, if the current position is within a fixed short distance of the goal, then 
move onto the center of the goal. 

e Otherwise, if the robot is on the path, move along the path towards the goal. 

e Otherwise, move directly onto the path. 


(x) = g) then do nothing (ros (x,g) = 


Thus if we iterate Anos (~> (zone1, goal)) from a typical starting position then the 
robot will first move onto the path, then move along the path towards zone 1, and 
then move onto the goal. Together the functions Aos and Aos constitute a lens: 


Àpos : R + Chunk x Zone 


We will now demonstrate how this lens can be a part of a hierarchy in which the next 
level is task-centric. Suppose the robot can carry an object, from the set: 


Object = {widget, gizmo, nothing} 


and sense its carry weight. A widget weighs 2, a gizmo weighs 7 and no object 
weighs 0, defining a function W : Object —> [0, œo). 


We define a lens: 
Awt : [0, 00) > Object 


where 
pi + 


wt 


: [0, œ) — Object 
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classifies any weight less than 1 as nothing, any weight between 1 and 5 as a 
widget and any weight greater than 5 as a gizmo. The backward function: 


Ayr: [0, 00) x Object — [0, oo) 


ignores the current weight, and takes the desired object to its resulting desired weight, 
namely, A,,(w, 0) = W (0). 


We can run these two lenses in parallel, yielding: 
Apos @ Awt : R x [0, 00) > Chunk x Zone x Object 
The parallel composition of lenses 
ài: Xi > Yı and %2 : Xo > Yo 


is the lens: 
àQ: Xix X% > VY, x Yo 


given by: 


v(x1, X2) = (vı (xı), v2(x2)) and 


u((x1, x2), (Y1, Y2)) = (u1 (x1, y1), u2(x2, y2)) 


Unlike the sequential composition defined previously, this is a non-interacting com- 
position. 
The second level of the hierarchy will be described as a lens: 


Mask : Chunk x Zone x Object — Task 


where 
Task = {task1, task2, nothing} 


The backwards function takes the current state and the desired task, and returns the 
desired next state required to complete the task. Task 1 entails collecting a widget 
from the goal in zone 1 and delivering it to the goal in zone 2; task 2 entails collecting a 
gizmo from the goal in zone 2 and delivering it to the goal in zone 1. A,.,.(c, Z, 0, t), 
which returns the robot’s desired next state given the current state and the task, is 
given by the following pseudocode: 


e If the task is 1 and the held object is a widget, then proceed to the goal of zone 
2 to deliver it: 


Arask(C> Z, Widget, task1) = (goal, zone2, nothing) 
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e If the task is 1 and the held object is a gi zmo, then proceed to the goal of zone 2 
to return it: 


Mask(C> Z, Gizmo, task1) = (goal, zone2, nothing) 


e If the task is 1 and no object is held, then proceed to the goal of zone 1 to pick up 
awidget: 


Arask(C> Zz, nothing, task1) = (goal, zonel, widget) 


If the task is 2, then there are three cases similar to the above. 
e If there is no task, then remain in the current state: 


Arask (CZ, 0, nothing) = (c, z, o) 
We can now compose together the entire control system; it is the lens: 
(Apos 8 Awt)Àtask : Rx [0, o0) —> Task 


This composite is commonly represented schematically by a ‘string diagram’ as 
follows: 


Chunk 
R e] Apos 
Zona | Atask |— Task 
[0, oo) Awt 
Object 


The update function of this composite treats the output of Asx» which is the next 
state desired by the high-level planner, as the input to the first level, which ‘translates’ 
it into the lower level of coordinates. 

Around this composite system we must place an ‘environment’. On the right-hand 
side sits the human controller or other top-level planner, which decides the top-level 
tasks given the top-level observations. On the left sits the ‘physical environment’, 
consisting of the real world (or a simulation thereof), together with actuators that 
implement the robot’s bottom-level desires and sensors that produce the bottom-level 
observations. Crucially, this physical environment will typically have an internal state 
that cannot be observed by the robot. 

In our example, for simplicity we take the top-level planner to be a constant task. 
The physical environment will store the robot’s position and currently held object. 
The current position is continually updated with the desired position provided it is 
reachable in a straight line. The desired weight is ignored since the robot has no 
corresponding actuator. When the robot’s position is in one of the goal areas, the 
carried object will change as an object is either picked up or delivered. 
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If we iterate the composite backward function: 
(Apos ® Awt)Atask) (=, task1) 


from any starting position, the robot will repeatedly navigate between the goals of 
zonel and zone2, picking up and delivering widgets. (If it is initially carrying 
a gizmo it will first return the gizmo before picking up its first widget.) 

This setup has the feature that time ‘naturally’ moves slower the higher one goes 
up the hierarchy. Suppose the robot’s initial position is in zone2 and it is holding no 
object. If the task is task1 then 4,,., will output (zone1, goal, widget). This 
will be used as the desired input to pos- The robot will navigate through several 
stages towards the goal of zone1, during which time the output of Asx will not 
change. After the robot reaches the goal, the environment will update its held object 
to a widget, which will cause Àf, to change its output to widget. This in turn 
will finally cause A,,,, to change its output to (zone2, goal, nothing), signaling 
a change in desire to move to the other goal to deliver the widget. This will again 
stay constant while the lower level pos navigates the robot towards the new goal. 

Here we have proposed to found abductive inference on the category-theoretic 
machinery of lenses. Besides abduction, we have also shown how lenses generalize 
backpropagation, variational inference, and dynamic programming. We then intro- 
duced novel “‘symbolic-numeric’ lenses, which allows hybrid structures, consisting 
of both symbols and these pre-existing lenses, to be hierarchically composed. This is 
important for implementing scalable planning: the general planning problem suffers 
from both branching and time horizon, which can be ameliorated by lower dimen- 
sionality as well as longer time jumps. This can be achieved by progressively building 
a hierarchy of ‘concepts’ and their affordances (cf. Sect. 11.2.2), and operationaliz- 
ing planning as abductive reasoning at the highest available level, which, thanks to 
the hierarchical composition, will still be firmly anchored in the sensorimotor level. 
In the next chapter, we will see how control loops, which are considered from a lens 
perspective by emerging research in categorical cybernetics, provide a compositional 
vocabulary to identify and regulate control systems. 
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Chapter 10 A) 
2nd Order Automation Engineering get 


A scientific theory is intelligible for scientists if they can 
recognize qualitatively characteristic consequences without 
performing exact calculations. 


H.W. de Regt [61] 


In this chapter, semantic closure meets system engineering: we describe how SCL 
systems can be constructed and controlled in practice, casting a developmental per- 
spective on automation which we call ‘2nd order automation engineering’. Let us 
first give context to our objective, starting with a quote from Bundy and McNeil 
[40], who described in 2006 what they considered to be ‘a major goal of artificial 
intelligence research over the next 50 years’: 


For autonomous agents able to solve multiple and evolving goals, the representation must be 
a fluent, i.e., it must evolve under machine control. This proposal goes beyond conventional 
machine learning or belief revision, because these both deal with content changes within a 
fixed representation. The representation itself needs to be manipulated automatically. 


The extrinsic motivation of an SCL system is utilitarian: it is to accept and perform 
work on command, where work is specified in terms of states to be reached (goals) 
or avoided (constraints). Thus, the system shall proceed toward constructing a world 
model such that, ideally, the entire state space is closed under inferencing: starting 
from as many states as might present themselves, any goal state which concludes 
the work is predictable (deduction) and conversely, starting from any goal state, as 
many other states as might occur are reachable (abduction). 

By definition, work is a contingent measure for reducing entropy, which is always 
performed against an environment: laws of physics prevail, other agents follow their 
own agenda and, willfully or not, resist the ‘mission’ of the system in one way or 
another. Even the embodiment/agency of the system itself inevitably constrains its 
prospects of success. In that sense, the intrinsic motivation of the system is necessarily 
structural: itis to broaden the scope and depth (‘horizontal’ and ‘vertical’ structure) of 
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a world model while absorbing the ‘dents’ caused by its contingencies. The function 
of an SCL system is therefore to accord both motivations, as per the definition of 
semantic closure (Sect. 7.2). In other words, this means to sustain a structural goal- 
directed homeostasis ‘at the edge of chaos’ as Kauffman puts it [167]—beyond that, 
ignorance unleashes oscillating behaviors, ending up in futility at best, in disaster 
at worst. Although Kauffman considers a multi-agent epigenetic landscape over 
evolutionary time scales, we are concerned with the growth of a single mind over the 
human time scale of work on command. As we will now see, “adaptation at the edge 
of chaos’ is as relevant for automation engineering as it is for the development of 
organisms, when re-cast as the hard-coded drive to control the multitude of feedback 
loops which unfold over the dynamic landscape of fine-grained models. 

Kauffman developed his concept of adaptation from a systemic perspective to 
explain how open systems evolve in complexity to fulfill their intrinsic determina- 
tion (spontaneous order) despite exogenous constraints (evolutionary pressure). But 
how do forms—the recognizable and composable manifestations of complexity— 
arise in the first place? When in 1917 D’Arcy Thompson published the first edition 
of ‘On growth and form’ [337] he probably did not anticipate that, four decades later, 
his inquiry into the genesis of stable structures would eventually be met by a formal 
theory that would extend its reach well beyond embryogenesis. In Thom’s ‘General 
Theory of Models’ [336], forms arise from a substrate, determined by a kinematics 
(the laws which govern the arrangement/interaction of its constituents) whereas the 
temporal evolution of forms is regulated by a separate dynamics. Indeed, forms gener- 
ally remain invariant under some selected pseudo-group G of interest (stability), yet 
sometimes break up completely (catastrophe) and new forms arise (morphogenesis). 
Thom’s theory of models is a mathematical framework for eliciting the dynamics 
necessary to explain (and when possible, to predict) change, despite the inevitable 
under-specification of the underlying kinematics, and parameterized by G. 

We confront the related inverse problem: to control the morphogenesis of con- 
trollers arising from a known kinematics. A general solution to this problem from the 
perspective of system engineering—‘2nd order automation engineering’ —ultimately 
deserves its own book. In the present work, we limit ourselves to presenting a minimal 
viable design within the scope of the SCL framework. 


10.1 Behavioral System Engineering 


As we have amply discussed in Part I, autonomous control systems are not systems 
that routinely switch between compiled behavior routines. Rather, the function of 
an autonomous control system is to engineer its own behavioral processes in com- 
pliance with requirements of scope, correctness, and timeliness. Behaviors result 
from control feedback loops instantiated over the substrate of a world model. Hence, 
to sustain the homeostasis introduced above, is to construct and maintain a world 
model such that the instantiation of desired control loops is predictable within the 
prescribed operational envelope. To an SCL system, control loops are the essential 
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observables by which it can assess the adequacy of its world model to the prospects 
of fulfilling its mission; as such, they constitute the forms of interest to be reasoned 
about. 


Relational Control Loops 


Functionally, a control loop is the coupling of a subset of rules (the output types 
of some rules are the inputs types of another) which guarantees the reachability 
of a goal state, starting from a terminal state (for now, the state of an actuator—a 
definition we will generalize later) constrained by parameters; parameters are non- 
terminal states which are not under the direct control of the loop. The guarantee 
of reachability is given by the composition of contracts, locally defined by rules. 
A tule is defined by (A, P4, f) —> (B, Pg) where A and B are types (generally 
composite), P4 a predicate on A, Pg a predicate on B, and f a transfer function that 
maps A to B. The pair (A, P4) forms a refinement type [157], where the predicate P4 
defines which subset of the possible representable values actually correspond to valid 
instances of the type A for the purpose of the rule. A contract is the guarantee that if 
one predicate holds, so will the other. This perspective of programming-by-contract 
[245], in which rules are relations (not functions), is close in spirit to the Behavioral 
System Theory developed by Willems [365] and confers a number of advantages: 
(1) unlike functions, all relations have a converse, (2) they allow ready modeling of 
nondeterminism, and (3) they allow identification and composition of invariants. 

Figure 10.1 illustrates the relational perspective on control. Further possibilities 
arising from an alternative categorical presentation via string diagrams due to Baez 
and Erbele are discussed in Sect. 11.2.4. The generic relational controller (b) is a lens 
implemented by a single rule 


a. 


Fig. 10.1 Relational control loops. a A generic model-based controller in functional form (coupling 
of inputs/outputs values). A forward model fwd makes predictions p from sensor readouts s and 
efference copies e; based on such predictions and set-point g, an inverse model inv computes the 
action a. b The same controller in relational form (coupling of constraints on types) is a lens, 
implemented by a rule R. The controlled state is Ao, the terminal state A4, the parameters @. There 
are two concurrent inference loops: (1) deduction loop and (2) abduction loop. The mixed flow (3) 
computes possible actions. c Complex control loops are composed of several rules/loops. Here, the 
controlled state is Æ, the terminal state A4, the parameters C U D; see text for details 
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R: (A, Pa, f) =y (Ao, Paa), A= Ao U Al 


through which concurrent inference flows are determined by contract satisfaction. In 
the first case (b.1), contracts are propagated forward to refine the state of interest (Aq) 
over an increasing time horizon. In the second case (b.3), there exists a solution a such 
as f(Pa(qa, p)) satisfies Pa, (g). This is not so in the third case (b.2) where, instead 
of a, a subgoal g’ is abducted such that P(g’, e) satisfies P4,(g). The constraint 
refined by g’ is furthermore refined over a decreasing time horizon as the b.2 loop 
iterates until the circumstances (here, p) are eventually such that b.3 holds—then 
the loop unwinds. This contract-constrained coupling of inferences constitutes the 
general kinematics of the ruleset and drives the instantiation of control loops at any 
scale in terms of rules involved. 

The SCL framework is parameterized by expression language. For concreteness 
and ease of explanation, we adopt for the rest of this chapter, the language of first- 
order linear arithmetic! (FLA) [33], in which types are subspaces” of R”, rules are 
linear, and predicates are constructed inductively through Boolean combinations, 
i.e., 

“2, PAV, o VY, g> Y, e| p, Yxip, AX. 


where g and w are linear arithmetic formulae. Note that FLA negations express 
that subregions of the state space are ‘to be avoided’ to the degree of their 
likelihood—‘forbidden’ when the likelihood approaches one—a trait we will lever- 
age in Sect. 10.4. Predicates in FLA are closed under the application of transfer 
functions, hence the predicate constraining the result of a rule can be automatically 
derived from the predicate constraining the inputs, namely by applying the transfer 
function to all of the constants, variables, and linear functions in the input predicate. 
Contracts can therefore be composed along any arbitrary chain of rules coupled via 
nonempty intersection between their input/output refinement types. This, in turn, 
allows representing control loops as (macro) rules and composing them as such; this 
plays an important part in system identification as we will discuss later. 


Hierarchical Behavior Selection 


Behaviors result from planning, which results itself from the instantiation of control 
loops, subjected to the kinematics discussed above. In SCL, abductions are always 
simulated: this allows exploring, concurrently, several possible courses of action 
(plans) to achieve a given goal. Of course, at some point in time, the system has to 
commit to one of these plans and start acting. 

Each time an efference copy is a premise of a prediction, the corresponding (ter- 
minal) goal is added to an auxiliary list associated with the prediction. Such lists are 


' We discuss other possibilities in Sect. 11.2.1. 


? Recall that, as befits the open-ended setting, n varies dynamically, according to the addition of 
synthetic dimensions. 
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concatenated as predictions become premises of others, provided that the trajectories 
they define over the state space are consistent. As a result, a plan is a prediction of 
a goal state, associated with a list of terminal goals ordered by deadline. We call 
‘end-plan’ a plan whose goal state M is the one imposed by the mission. All end- 
plans pursue the same goal and, at any point in time, there is at most one end-plan 
committed to. Anytime the deadline of a goal g (part of an end-plan) is reached, a 
procedure commit is invoked—g is said to be signaled. Note that the procedure is 
invoked concurrently by goals whose deadlines overlap within its WCET. At time t 
a goal g invokes commit: 


1. If there is a current plan and its likelihood is below some threshold of acceptance 
T, then cancel the plan. 

2. If there is no current plan, then find the best one (at minima, the one with the 
highest likelihood above T); if none is found, return. 

3. Execute all goals in the current plan, whose deadline falls in [t — WCET /2, t + 
WCET /2) and which have not been executed yet and are signaled. Due to the 
composite nature of states, non-terminal goals are the conjunction of sub-goals. 
If a terminal goal k is in such a conjunction C, then the execution of k is effective 
if and only if the executions of all other goals in C are effective. 

4. If the execution of g is not effective, then cancel g. 


For the commitment procedure to be effective requires that end-plans can be pro- 
duced before the deadline of its earliest goal, which is of course rarely the case 
in practice. To keep the WCET of computing end-plans within acceptable bounds 
requires hierarchical planning. Rules are coupled via their types, which in turn are 
hierarchized by virtue of abstraction; in that sense, type abstraction imposes a hier- 
archy upon the ruleset. Consider for example, a mobile robot with an arm equipped 
with a gripper. When the gripper G is actuated, effectively seizing an object O, then 
when the robot moves, observations that O moves along the same trajectory as G 
will ensue: a rule R will be synthesized (see next section) to capture these, along with 
a positive precondition P on R predicting the success of R based on the prior context 
of A having been close enough to G and G having been actuated. The successful 
firing of P actually signifies °G grabbed O’, denoted by a new abstract type T (also 
detailed in the next section). In terms of abduction, T is considered a terminal state 
in the sense that the modalities of reaching T are irrelevant to the activity of planning 
the movement of O to some target location. 

All type abstractions (super-types, latent states, etc.) actually decouple control 
loops. Accordingly, we extend the definition of terminal states to “states transduced to 
lower levels of abstraction’—sensors and actuators being at level zero. “Transduced’ 
means that there exists a rule (a transducer) whose right-hand state is at a level of 
abstraction higher than that of its left-hand state. Hence, a terminal state of a control 
loop at level N may be the controllable state of another loop at N — 1, mediated by 
some transducer, as illustrated in Fig. 10.2. Finally, we can extend our definition of 
an end-plan to ‘a plan whose goal state is a terminal state’, considering the top-level 
mission goal state M a terminal state. The commit procedure will be invoked for 
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Fig. 10.2 Hierarchical behaviors (an ellipse denotes a path across the ruleset). An end-plan with 
end-goal M at abstraction level N is composed of terminal goals, some of which (C and D) are 
transduced (rules Tọ and 7)) into lowers levels: there these global goals become end-goals for local 
end-plans 


all end-plans at some level N, each plan being computed in parallel to assemble the 
subgoals of plans at level N + 1. 


System Identification 


In automation engineering, system identification refers to the activity (human labor) 
of modeling an existing system at some appropriate level of description, chosen to 
confer some expected operationality: this can be, for example, that of predicting, 
controlling, upgrading, or all of these (e.g. to build a digital twin [280]). Confronted 
by the variability of both its environment and mission, an SCL system continually 
adapts its world model to ensure the existence of the necessary control loops, the 
set of which constitutes a model of its capabilities (self model) while—conversely— 
delineating the frontiers of its ignorance. Self-identification is the internal process 
producing and maintaining such a self model for a purpose: deficiencies therein 
are bound to trigger adaptation of the world model via rules and types synthesis. 
Conversely, adaptation is also triggered externally, any time the actual world is ‘sur- 
prisingly’ at odds with its model. Accordingly, the SCL dynamics is implemented 
by two complementary and concurrent synthesis processes: one proactive, the other 
reactive—Fig. 10.3 gives an overview. 

In SCL, both the self and world models are written in the same language express- 
ing the same substrate, on which operates the same categorical interpreter—the self 
model consists of rules which abstract the rules modeling the world. At this opera- 
tional level of description, a control loop is a mapping between a system trajectory 
toward a goal state and a subset of the world model at some level of abstraction (its 
local substrate). To identify a control loop is to find a substrate matching the pattern 
illustrated in Fig. 10.4. Specifically, the substrate of a loop controlling the state S is 
a set of rules verifying the following structural properties: 


(P1) There exists an inference loop for predicting and subgoaling S. 
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control 
ao ~ extend/refactor maintain fe aaa 
J 


World maintain 


identify 


identify 


Fig. 10.3 Reflective system adaptation. System identification is performed on and by the system 
itself as its world model grows. The dynamics of the system is controlled by two processes by 
means of rule and type synthesis: A reacts to variations of surface phenomena (Sect. 10.2), whereas 
B proactively invokes structural heuristics (intensification/diversification, Sect. 10.3). B is domain- 
independent, decoupled from A, and operates deeper in the structure of the world model and at 
longer time horizons. Constraints on resources and responsiveness are less stringent for B than for 
A 


Fig. 10.4 General control loop template. T is the terminal state, S the controlled state. The prop- 
erties P1, P24 and Py, are illustrated in blue, red, and green, respectively; see text for details 


(P2) There exists a terminal state T such that (a) T is abducted from S and (b) S is 
predicted from efferent copies of T. 


The self model is both hierarchical and compositional, both properties being 
inherited from rules and contracts. Technically, two rules 


X:A—> B andY : C > D with BAC # Ø 
are composed serially into a third X + Y = Z : F — G, where 
F = (A U C) \ (B N C) and G = (B U D) \ (BNA C) 


We extend the + operator to express joint-serial composition: a set of n rules {X; : 
A; — B;} is composed with a rule 


Y : C > D wth B;NANC Ø 


into a third X + Y = Z : F —> G where 
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Fig. 10.5 Loop composition. The result of a + operation is a loop describing its arguments at one 
level of hierarchy higher. Controlled states are marked in blue, terminal states in orange, parameters 
in green, and control loops in grey; see text for details 


F= Ja; UOUE: NC) 


i=l i=l 
and 


G =| Jeu D) Ueno 


i=1 i=1 


Control loops can therefore be expressed by the same formalism used for rules (see 
Fig. 10.5) and, in particular, the general loop template with terminal state T and 
controlled state S$ is 


L(T,S) = {Ro+ra t+ Ri tre, Rat ry + R3}+ Ra trs + Rs 


where the r; are free parameters and the R; are subject to the following constraints: 


Ro: Ao > Bo, T € Ao R,: Aj > Bo, S € Ay 
R : A2 > Bo, T € Ad R3 : A3 > B3 
Ry: A4 > Ba, B3 N A4 £ Ø Rs: As > Bs,S € Bs 


L is then the search pattern for identifying control loops during system identification 
at arbitrary levels of abstraction. 
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10.2 Reactive Synthesis 


The reactive synthesis of a rule is triggered by one of the following two events: the 
unpredicted success of a goal, or the failure of a prediction—a missing path in the 
model structure in the first case, a faulty path in the second. We begin with the first 
case. 


Local Scale: Rules 


We are concerned with synthesizing a rule of the form A — B, where B is the 
observed goal state which the system could not predict, and A the history of all states 
observed before B. Of course, a heuristic is needed to limit the scope of A. For 
example, A can be restricted to the tuple formed by the last N most salient states 
only—the saliency of a state S being the attention it receives relatively to others, i.e., 
the relative number of references to S in the jobs of the highest priority (see Sect. 8.2). 
Of most relevance, we introduce at the end of this section the notion of postcondition 
which summarizes histories of states as hierarchical patterns. Saliency can then be 
strengthened considering the ‘height’ of a state in a hierarchy, reminiscent of the 
‘chunks’ theorized to compose human short-term memory and learning—see [114] 
for a computational model. 

As illustrated in Fig. 10.6, the synthesis of a rule A —> B requires dim(A) + 1 
successive non-colinear sample pairs (a;, b;) of vectors in A and B. Each newly 
observed vector in A (e.g. a3) is transformed into its rejection from the hyperplane 
formed by the previous samples (ag and a1); its associated vector in B (b2) is trans- 
formed accordingly (into b3). It then suffices to transform the resulting orthogonal 
basis into an axis-aligned unit basis and to transform the b; accordingly in order to 


Fig. 10.6 Synthesis of a rule (A, Pa, f) > (B, Pg) with A = R? and B = R?. a Samples, each 
a pair (ai, bi) of vectors in A and B, are observed in a succession from which an orthogonal 
basis (a) of A is constructed. b œ is then transformed into an axis-aligned unit basis (6); f is 
b: B=C+M xa: A. P4 is the polytope formed by the a; whereas Pg is that formed by the b;; 
see text for details 
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define the transfer function f asb: B = C + M xa: A. P4 is the polytope enclos- 
ing the a; whereas Pp is that enclosing the b;. Null columns in M indicate that the 
inputs located in the tuple A at these column indices are irrelevant to the output; both 
A and P4 are pruned accordingly. 

Polytopes grow as new positive evidences (samples) are observed whereas nega- 
tive evidences trigger the synthesis of new local restrictions of f. In general, f is not 
linear globally—over the entire state space—hence applying f outside of P4 may 
yield an error beyond acceptable tolerance. As a consequence, a new rule R’ is to be 
synthesized as above, whose predicate P4 is disjoint from P4. By this construction, 
P4 is kept convex: convexity warrants interpolation should an input fall within a 
predicate; outside the predicate, outputs are extrapolated, albeit with a confidence 
decreasing with the distance of the input to the predicate. Note that it is also possible 
to subject the likelihood of interpolated outputs to experience. For this, it suffices 
to maintain the covariance matrix and centroid of all past inputs: the likelihood of 
interpolated outputs is then inversely proportional to the Mahalanobis distance (times 
the reliability of the rule and the likelihood of the inputs). 

Global non-linear mappings are approximated as the union of local linear ones, 
each defining its own convex local restriction of a more global domain. The mutual 
exclusion of the local domains P4 and P} is maintained, as they grow, using pre- 
conditions as follows: 


1. If P4 C Pa, then a negative precondition on R is synthesized with P4 as its 
left-hand predicate. 

2. Otherwise, if P4 N Pa Æ Ø then one negative precondition is synthesized for 
each rule, with P} N P4 as its predicate. 

3. Otherwise, no precondition on either rule is synthesized. 


Global Scale: Pre- and Postconditions 


A second possible trigger for the reactive synthesis of a rule is the failure of a 
prediction. We proceed as in the case of the unexpected success of a goal, to synthesize 
arule N: (C, Pc, f) —> (D, Pp) where (D, Pp) is the refinement type denoting the 
failure of some rule R. N is anegative precondition for R: when N fires, the reliability 
of the inferences produced by R is lowered, proportionally to that of the inference d : 
D produced by N; during abduction, a goal matching the right side of R triggers the 
production of a subgoal to prevent N from firing, hence, other subgoals are derived 
in order to avoid instances of C. Figure 10.7 illustrates the flow of inferences. 
Whereas predicates in left-hand refinement types impose local constraints on 
forward input bindings, negative preconditions impose global constraints on the 
target rule, i.e., they define the context in which the rule is likely to fail, regardless 
of the validity of the bindings of its left-hand refinement type. Just as there exist 
contexts for failure, there exist contexts for success: these are captured in the left- 
hand refinement types of positive preconditions—see Fig. 10.8. The synthesis of the 
positive precondition P is triggered by the success of R if not already predicted 
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2p success(N, C(*, C,(i,j, k, success(R, A(*, A,(x, y, 2)))))) 
— success(R, A(*, A, (x, y, z))) 


= success(R, A(*, A,(x, *, *))) 


b. 


Fig. 10.7 A negative precondition N for a rule R is represented. The first type (in grey) embedded 
in the type of either the left- or right-hand term of R always denotes the success of R; inputs 
matching Ao or Bo do not trigger inferencing. a Forward inference (in blue): any time an instance 
a: A triggers a forward inference, a prediction success(R, a) is produced by R, which in turn may 
trigger the production by N of its negation: this lowers the likelihood of bı. b Backward inference 
(in red): bi ~>AAE,AD GYAK,EDSVASb>BVY 


by another positive precondition. Note that preconditions are rules and as such, can 
themselves be subjected to other preconditions. 

Any time a precondition P : A —> S (where S is the success/failure of some 
rule R) is synthesized, an abstract type X and a transducer T are also surmised: X 
is defined as a new dimension of the state space, and T as A —> X. The transfer 
function of T is learned as any other function, with the minor difference that since 
X is abstract (therefore not observable), evidences for the outcome of R are taken as 
proxies for evidences of X. 

Mirroring preconditions, whereas predicates in right-hand refinement types define 
local guarantees on forward output bindings, postconditions define global conse- 
quences of rule firings. Recall that forward rule firing statements (e.g., success(R, ...) 
or ssuccess(R, ...)) are first-class states: hence they can be surmised as input types 
embedded in the composite left-hand type of a rule. Postconditions are rules taking 
the (forward) firing of other rules as input types. Such input types summarize (parts 
of) the state history: if a rule R admits A; as input types composing its left-hand 
type, then the single input type success(R, ...) in a postcondition S is equivalent to 
S admitting all the A; (and the conjunction of the negations of all negative precon- 
ditions for R) as its own input types. Conversely, an input type -success(R, ...) in 
S is equivalent to the negation of at least one of the A; or the assertion of at least 
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success(P, C(*, C,(i, j, k))) 
success(R, A(*, A,(*, *, *))) 


success(R, A(*, A(x, y, z))) 


€ = success(R, A(*, A,(x, y, *))) 


K = A,(x, y, *) 
b. 


Fig. 10.8 A positive precondition P for a rule R is represented. a Forward inference: any time 
an instance c : C triggers a forward inference, a prediction success(R, *) is produced by P: this 
increases the likelihood of bı. b Backward inference: b} > K A€,k > 9,€ > bAa,b> BAY 


one of the negative preconditions for R. Note that some of the A; may themselves 
be the firing of postconditions: this enables postconditions to capture state histories 
as temporal hierarchical patterns. 


10.3 Proactive Synthesis 


The purpose of proactive synthesis is to mitigate deficiencies observed in the set 
of macro-rules (control loops) describing the system, resulting from the process 
of system identification, as discussed in Sect. 10.1. We now proceed to describe 
an elementary strategy for proactive synthesis which frames intrinsic motivation in 
terms of control—possible alternative approaches are clearly an open-ended research 
issue. Within the discipline of metaheuristic search, it is often useful to characterize 
the state trajectory of a system as diversifying or intensifying.? According to Glover 
and Laguna [113]: 


The main difference between intensification and diversification is that during an intensifica- 
tion stage the search focuses on examining neighbours of elite solutions. The diversification 
stage on the other hand encourages the search process to examine unvisited regions and to 
generate solutions that differ in various significant ways from those seen before. 


3 The related notions of exploration and exploitation also appear in the literature on evolutionary 
computation and intrinsic motivation. 
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As observed by Blum and Roli [26], the notions of intensification and diversi- 
fication are not mutually exclusive. For example, while random search is strongly 
diversifying and gradient descent is strongly intensifying, simulated annealing [172] 
lies somewhere in-between, progressing from diversification at high temperatures to 
intensification at low temperatures. 

The control mechanism we describe here is derived from that of the Reactive 
Tabu Search (RTS) [18, 19] an extension of Glover’s original tabu search [113]. 
Tabu search is a local search metaheuristic [208] in the same family of ‘single- 
point search’ techniques as stochastic gradient descent and simulated annealing. 
The basic mechanism of tabu search (termed recency-based memory) maintains a 
restricted local neighbourhood by prohibiting the choice of a neighbouring state if 
(some attribute of) that neighbour has been encountered recently. The simplest (or 
fixed-tabu) implementation implements the recency structure as a sequence of the 
last k states encountered, where k is the tabu-tenure. In addition to the recency- 
based memory structure (which could be said to model ‘short-term’ memory), many 
implementations also maintain a ‘long-term’ or frequency-based memory, which is 
essentially a frequency-histogram of attributes with a tenure much larger than k. 

The essential idea of RTS is to inform control via dynamical system metrics. We 
therefore briefly recap dynamical systems terminology. Informally, an attractor of 
a dynamical system is a set of points in state space such that all ‘nearby’ points in 
the state space eventually move close to it. The simplest dynamical system is one in 
which there is a single fixed point which acts as an attractor for all states. The next 
simplest attractor is a limit cycle, in which trajectories converge to a closed loop. 
The cycle length of a dynamical system in state s is the number of iterations since 
s was last encountered (or oo if no encounter with s is recorded). It is also possible 
for the trajectory to be confined within some region of phase space but exhibit no 
obvious periodicity, due to the presence of a so-called strange attractor. RTS thus 
instruments the search space with recency and frequency information, in order to 
drive control mechanisms that: 


1. are self-adaptive (tabu-tenure is a function of the moving average of the detected 
cycle-length); 

2. maintain a good balance between intensification (i.e. exploration of promising 
regions) and diversification; 

3. recognize when the search lies in the attractor of an unpromising region. 


The presence of previously-encountered attributes in the recency list triggers 
the ‘fast-reaction’ mechanism which leads on successive iterations to a geometric 
increase in recency tabu-tenure. This tends to force the search to choose neighbours 
with unexplored attributes and will eventually break any limit cycle. Conversely, a 
“slow-reaction’ mechanism acts to counter redundant prohibition by reducing the 
tabu-tenure when the fast-reaction mechanism has been idle for some time. In order 
to detect the presence of strange attractors, the number of repeated attributes in 
the frequency structure is examined. When the number of such attributes exceeds a 
threshold level an ‘escape mechanism’ is activated, which (as per Battiti’s original 
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implementation) consists of a random walk of length proportional to the moving 
average of the current cycle length. 

Measures which characterize intensification and diversification are obtained from 
the recency and frequency structures and used to select of an inference method. 
Describing any particular selection mechanism in detail would be overly specific: 
capturing entire families of indexing strategies via representations such as fuzzy logic 
[372] is clearly possible. Hence, we intentionally give here a qualitative description: 


1. When intensifying, the system tends to: 


e apply or synthesize rules which tend to move towards goal states; 
e compress known regions of the state space via union of more primitive regions; 
e invent synthetic types via the application of abstraction. 


2. When diversifying, the system tends to: 


e invent some target type which does not overlap with any explored region; 

e place low priority on goal states (e.g. potentially bypassing a theoretically 
reachable goal state in favor of traversing new areas of the state space); 

e synthesize new rules and types via analogy (i.e. pick two rules with a common 
domain). 


The heuristics above are merely intended to give a flavor of the relationship 
between metrics and strategies. Cross-domain heuristics for proactive synthesis are 
clearly of great potential value, and are rightfully the subject of an extended empirical 
investigation. 

In order to make a choice of rules and target types in the above, the selection 
mechanism must ultimately be grounded in a specific state space. There are actually 
two choices here: (1) the ‘first-order’ state space described by the current ruleset and 
(2) the infinite ‘second-order’ state space described by the prospective addition of 
rules. By means of defunctionalization [287], second-order rules of the form A > 
(B — C) can be ‘uncurried’ into first-order rules (A, B) —> C, thereby allowing all 
selection decisions to be grounded in a first order state space as follows: 


e The state of a type is given by its predicate. 

e The state of a rule is the swept-surface defined by interpolating the application 
of transfer function between the domain and codomain values associated with 
predicates. 

e The measure of static-intensification of some newly proposed type or rule with 
respect to a ruleset is a function of the degree of overlap with (i.e. polytope inter- 
section) that ruleset. 

e The measure of static-diversification of some newly proposed type or rule with 
respect to a ruleset is a function of the distance from the closest point in that ruleset. 

e The dynamic analog of the above measures is considered with respect to the recent 
past/near future trajectory of the state of the ruleset, optionally with some time- 
discounted weighting factor. 
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In summary, the above allows higher-order cognition to be framed in terms of 
reactive control, as applied to the representation of the reasoning process itself. 
One key issue is scalability (i.e. how to avoid ‘long-tailed’ fragmentation of the 
representation). The proposed means of addressing this is to favour the frequent 
application of abstraction as a means of imposing equivalence classes on the search 
space. 


10.4 Safety 


Safety is rightfully a dominant concern for autonomous systems. Indeed, taken to 
the extreme, ‘AI safety’ has spawned an academic subdiscipline devoted to the most 
dire of futurist predictions [30]. However, we are concerned here with the mismatch 
between requirements and the capabilities of contemporary approaches. We can 
broadly consider safety to be “confidence that the system will not exceed its opera- 
tional envelope’. There is clearly a continuum of formality with respect to the degree 
of confidence and bounds on/communicability of the envelope. Communicability is 
a bidirectional notion: how accurately has desired behaviour been specified to the 
system, and how interpretable is the system’s representation of it? 

Regarding contemporary approaches: at one extreme, we have formal methods, 
which permit the explicit delineation of behaviours, together with guarantees that 
forbidden regions of the state space will not be entered. In the manner of traditional 
symbolic systems and (idealized) theorem provers such as the Gédel machine [306], 
it is possible to enshrine ‘ground truths’ within the system by equipping the seed 
with axioms and sound rules of inference. At the other end are approaches layered on 
deep (reinforcement) learning, in which behaviour is indirectly mediated via reward, 
and obtaining confidence in the behavioural envelope is an active research area. 

Considered in isolation (i.e. independent of some more autonomous invoking 
architecture), these approaches are anyway only applicable in ‘closed world’ envi- 
ronments, the dimensions of which must be prescribed a priori by human ingenuity 
and which are assumed to be unchanging thereafter. 


Experiential Safety 


In the open-world setting of genuinely autonomous systems, there may always be 
possibilities not foreseeable as a function of current inference. This could be because 
e.g. the sensors are not equipped to detect the associated latent interactions a priori; 
the butterfly effect; emergent properties of interactions, etc. As befits the scientific 
method, knowledge obtained via the system’s own hypothesizing is therefore pro- 
visional, and ‘facts’ are simply hypotheses that have proved useful in the context 
of recent tasks. However, overriding constraints remain in place: to the best of its 
knowledge the system will never enter a forbidden region of the state space. Hence 
a robot tasked with remaining upright could fail to do so, given ignorance of high 
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winds, for example. In such an ‘experiential safety’ setting, safety guarantees are 
at both global and local levels: the local level is concerned with guarantees at the 
scale of single inference steps, the global scale with the longer-term behaviour of 
the system. Regarding local guarantees, the framework provided by F-algebras is of 
particular importance for general intelligence, in that it allows safety to be reconciled 
with open-endedness. Specifically: 


e Open-endedness: the algorithm template for the F-algebra of a type can be pro- 
grammatically derived [334], even if the type has been synthesized online by the 
system. The template orchestrates the invocation of learned rules, as described in 
Sect. 10.3. 

Safety: The interpreter defined by the algorithm template can nonetheless be con- 
strained to well-defined behavior, i.e., mapping only between prescribed input and 
output types with required constraints. 


Global safety properties of general interest are reachability, i.e. find the set of 
states reachable from a given initial state Xo and controllability, i.e. find the set of 
states controllable to a given final state X,. Modulo efficiency considerations, the 
bidirectional nature of rules means that reachability and control can be considered 
equivalent—they are anyway the same in linear systems, for example (further details 
of a categorical treatment in this setting can be found in Section 11.2.4). Depending 
on the properties of the expression language used, variants (e.g. point-to-point reach- 
ability) may be more computationally efficient. More generally, such guarantees can 
be divided into two categories: 


Formal Reachability 


In this approach, we temporarily assume the soundness of current hypotheses— 
based, for example, on the reliability of rules and likelihood of states. In linear 
continuous-time dynamical systems, reachability is decidable [131]. In this set- 
ting, reachability can be updated periodically, e.g., whenever a rule/type is added/ 
deleted, or perhaps less frequently as an empirically-determined tradeoff between 
task urgency and available resources. 


Time-Bounded Reachability 


For expression languages in which reachability is not statically decidable, an alter- 
native is to assume the world remains unchanged while a simulation procedure runs 
in the background: effectively iterating (perhaps some approximation of) the current 
transition relation of the system to determine which states are reached. Hence, this 
approach is likely to be of greatest value for determining the behavioural envelope 
in the near-term. 
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Chapter 11 A) 
Prospects rie 


‘Understanding as Representation Manipulability’ reduces 
understanding — a notion that picks out a particularly complex 
cognitive state — to representation, inference, and object 
manipulation. 


D. A. Wilkenfeld [364] 


11.1 Summary 


Intelligence as Managing Limited Resources 


The failure of GOFAI created a vacancy for a new guiding philosophy for AI. As 
it happened, a new perspective was already waiting in the wings: having previously 
been repeatedly rejected by peer review, Rodney Brooks’s behavior-based approach 
to robotics [37] had privately gained traction, with seminal works [35, 36] setting out 
the philosophy and practice of the “Physical Grounding Hypothesis’ [35]. Soon, the 
notion that ‘intelligence requires a body’ was common parlance in the AI community. 
In retrospect, this ostensible embrace of embodiment was rather more superficial than 
one would have hoped: Brooks subsequently observed [34] that in certain areas of 
AI research, “hardly a whiff of the new approaches can be smelled”. Relatively 
soon thereafter, nascent enthusiasm for deep learning turned attention away from the 
intrinsically ‘in vivo’ Brooksian approach towards the typically more forgiving ‘in 
silico’ environments, in which interaction with real-world objects was replaced with 
the simpler (but seemingly hard-to-generalize) task of classifying pictures of them. 

As happened with the move from analog cybernetic feedback to digital loss func- 
tions (Sect. 7.2), we claim that an essential guiding property was thereby lost, to the 
detriment of AI philosophy and practice. We have argued throughout that notions 
of ‘universal intelligence’ [194] and associated AI architectures such as AIXI [153] 
have no economic utility, since they are not in any way grounded in the finiteness 
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of resources. A system which mechanistically searches vast spaces,! oblivious to 
the ticking clock of its environment and the mortal concerns of its users is clearly 
the antithesis of intelligent. To the contrary, we have emphasized that the primary 
drivers for an intelligent system must, by design, be the asynchronous dictates of the 
user, continual changes in environment, and inevitable variations of resources; an 
intelligent system must respond gracefully to either in a timely manner. 


‘Work on Command’ Manifests General Intelligence 


Although Brooks emphasized that embodiment in real-world environments was 
essential, related work was predominantly characterized by being reactive. This 
therefore omits the kind of high-order cognition that we have argued is vital for 
sample efficiency. In contrast, the SCL approach reconciles deliberative planning 
with asynchronous and open-ended environments, whilst still preserving the any- 
time requirement of ‘work on command’. In common with many cognitive scien- 
tists, we consider the ability to leverage and extend knowledge across domains to 
be synonymous with general intelligence [84, 148, 189]. We therefore claim that 
any meaningful notions of generality must also be grounded in a purpose-centric 
perspective. Just as Brooks has insisted that ‘in vivo’ experimentation in noisy and 
complex real-world settings is the proper framing for experimentation, we claim 
that the essential pragmatically meaningful setting for domain generalization is that 
of ‘work on command’. In this setting, efficient knowledge transfer is then a pre- 
requisite for the ability to complete related tasks in a timely manner. In Sect. 11.2.2 
below, we consider the prospect that it may be possible to learn some ‘universal’ 
building blocks of compositional knowledge representation. 


Understanding as Representation Manipulability 


The SCL formulation effectively considers ‘understanding’ to be synonymous with 
‘representation manipulability’ [364]. Formally, one can say that an agent A under- 
stands phenomenon F in context C iff A possesses a representation R of F, to which 
A could make local modifications, within constraints specified by C, thereby produc- 
ing a representation R’ of F, that enables inferences or manipulations of F towards 
goals specified by C. 

What is therefore required is that representations R be ‘sufficiently malleable’ 
to act as a basis for construction, common usage, and novelty (in order of depth 
of understanding required). This malleability is a Gestalt property: metaphorically, 
it must preserve the “essential nature’ [148] of the representation, only allowing it 
to be perturbed in a way that describes a possible world.” As per the discussion in 
Sect. 9.2, the production of R’ in this manner is thus synonymous with our notion of 
the functorial transformation of hypotheses. In this sense, SCL provides an exemplar 
for the research program proposed by Bundy and McNeill in 2006 [40]: 

Reasoning systems must be able to develop, evolve, and repair their underlying representa- 


tions as well as reason with them. The world changes too fast and too radically to rely on 
humans to patch the representations. 


1 Whether for proofs of optimality or for solutions to a regression problem. 
? As generalized from a priori seed constraints and the empirical experience of the system. 
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It is also illustrative to contrast this notion of ‘understanding’ with the weaker 
notion of ‘robustness’ in machine learning. Robustness is typically interpreted as 
(e.g. translation) invariance and/or noise tolerance, with these together proposed 
to guard against adversarial inputs in the general form of ‘single pixel noise’ [9]. 
Robustness interpreted as translation invariance would have a classifier say a table 
is a table even when it is upside down, although we understand that the essence of 
‘tableness’ is that it ‘affords support’. Robustness in the sense of noise tolerance 
would have a classifier say a car without wheels is still a car, although humans 
immediately understand that does not afford forward motion (q.v. [338]). 

Representational issues have been long debated in AI, but—as with many related 
issues such as the symbol grounding and frame problems—many ostensible problems 
can instead be considered to be artifacts of this kind of ‘overactive reification’. 
This is exacerbated in current practice by the supervised learning preoccupation 
with ‘noun-centric’ classification, in which objects are simply assigned a nominal 
category that is disconnected from the context of end usage. As per Wilkenfield [364], 
we therefore consider ‘understanding’ to be a process: contingent on the situated 
relationship between system and environment, cashed-out via demonstrations of 
ability on ‘analogous tasks’. 


11.2 Research Topics 


11.2.1 Choice of Expression Language 


The power of reflective reasoning in SCL is determined by the choice of expression 
language. By virtue of inductive construction, the interpretation of recursive expres- 
sions described in Sect. 9.2 is guaranteed to terminate [334]. Naturally, if one were to 
elect to use an expression language with arbitrarily expressive primitives, the ability 
to reason about them is formally bounded by Gédel’s incompleteness theorems [115, 
319].° In practice, the entities being reasoned about are grounded manipulations of 
the environment, rather than abstract formal proof objects, hence it is anticipated that 
relatively simple expression languages and learned denotational interpretations will 
suffice. 

In accordance with the Curry—Howard isomorphism [176], a constrained subre- 
gion of the state space can alternatively be considered as a type T, with the constraints 
reflectively represented as predicates denoting the invariant properties of that type. 
The instantiation of an object of type T is therefore synonymous with a sequence of 
state space transformations, the result of which lies within a subregion of the state 
space defined by T. In programming language terms, there are a range of options 
for trading expressiveness of compositional properties against decidability and/or 
efficient synthesis procedures [263]. For example, the current notion of ‘differen- 


3 For completeness: some heuristics are also possible here [192]. 
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tiable programming’ can be represented via an expression language of ‘higher-order 
functions which return locally-linear functions’ [77, 78]. It has also recently been pro- 
posed [321] that the category Poly (of so-called polynomial functors a.k.a. dependent 
lenses) is particularly well-suited for representing both context-sensitive interaction 
and semantic closure. 

Regarding abstraction, many variant algorithms have been devised for anti- 
unification, differing in the expressiveness of the underlying expression language. 
The simplest is syntactic anti-unification, which has complexity O(max(|e;|, |e21)), 
where |e| is the number of nodes in the abstract syntax tree (AST) of an expression 
e. Syntactic anti-unification is at its most useful whenever the expression language 
has a normal form, i.e., for any expression e, there is some sequence of transforma- 
tions which yield a unique expression e’ that acts as a representative for all such e. 
Languages such as the Simply Typed Lambda Calculus or System F have a normal 
form [265]. More generally, the existence of normal forms and Turing completeness 
are mutually exclusive; normal forms guarantee termination. 


The Expression Language of ‘Conceptual Spaces’ 


As an additional example to that of first-order linear arithmetic previously-described 
in Section , a simple but concrete example of a possible expression language is that of 
“conceptual spaces’ [103, 104], proposed by Gärdenfors as a bridge between symbol- 
ist and connectionist approaches. It is conjectured that naturally-occurring concepts 
are characterized by subspace regions with a topological structure that is connected 
and convex. For example, the topology of color corresponds to (some variant of) the 
color wheel, that of time to the real line. Qualitative notions (e.g. ‘relative temper- 
ature’) are supported via the topology of ordered intervals. Goguen has previously 
proposed a system which uses conceptual spaces for describing anthropocentric rea- 
soning about space and time [118], observing: 


Sensors, effectors, and world models ground elements of conceptual spaces in reality, where 
the world models are geometrical spaces. This implies that the symbol grounding problem is 
artificial, created by a desire for something that is not possible for purely symbolic systems, 
as in classic logic-based AI, but which is natural for situated systems. 


Compositionality of conceptual spaces has been explicitly studied: in recent work, 
Bolt et al. [28] employ category theory to perform compositional interpretation of 
natural language via a grammar over convex relations. This work is a foundation 
for future research into highly-expressive and principled compositionality. For SCL 
purposes, the ‘expression language’ of conceptual spaces is therefore that of affine 
transformations of convex (sub)regions of the state-space, and the associated truth 
predicate can always be determined as a function of the set of points in the intersection 
of two regions [342]. Expressing the operations of SCL (as described in Sect. 7.2) 
in the language of conceptual spaces requires much less sophisticated interpretation 
than does natural language, with geometric operations being sufficient to support 
these inference mechanisms. 
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11.2.2 Compositional Primitives 


Previous chapters have argued that compositionality is key to addressing the foun- 
dational issues of generalization and representing long-tailed distributions with high 
sample efficiency. This suggests that what is desired are a collection of represen- 
tations which form a ‘basis set’, the elements of which can be composed (perhaps 
nontrivially, via analogy) to describe a wide range of phenomena. We therefore claim 
that the strongest possible emphasis should be placed on the search for compositional 
primitives, i.e., compressed parametric representations of recurring phenomena, cap- 
tured across multiple sensorimotor modalities. In cognitive science, such abstractions 
are known as image schema [159, 188] and are intended to represent common pat- 
terns in the embodied experience of space, force, motion, etc. Early work in this 
area induced image schema corresponding to spatial propositions from video data 
[96, 285]. There have also been attempts to model image schema symbolically [3, 
139, 182], with recent work on a qualitative representation of containers in a sorted 
first order language [60]. It is clearly desirable that computational representations of 
image schema enjoy the cross-domain ubiquity ascribed to their cognitive counter- 
parts. Concurrently with the development of the present work, a recent trends in deep 
learning is the proposed universality of so-called ‘foundation models’ [29] which 
provide a broad basis of representations for downstream tasks through large-scale 
self-supervised training. While this paradigm offers the advantage of well-known 
engineering pipelines, we saw in Chap. 4 that compositionality in the algebraic sense 
is essentially absent from deep learning, as considered across heterogeneous archi- 
tectures and arbitrary constraints. Furthermore, the full grounding of language and 
other symbols will require representations which support strong invariant propaga- 
tion and the ability to produce reasonable counterfactual statements. Since achieving 
the associated ‘malleability of representation’ enjoyed by humans has so far proved 
elusive, it is perhaps useful to focus initially on a related, but more overtly embodied 
notion: that of ‘affordances’. 


Affordances 


The term ‘affordance’ was coined by Gibson [109] to describe a relation between an 
agent and its environment, grounded by the physical embodiment of the agent and 
the recognition capacity of its perception system: 


If you know what can be done with a graspable detached object, what it can be used for, you 
can call it whatever you please. The theory of affordances rescues us from the philosophical 
muddle of assuming fixed classes of objects, each defined by its common features and then 
given a name. [...] But this does not mean you cannot learn how to use things and perceive 
their uses. You do not have to classify and label things in order to perceive what they afford. 


Although Gibson [109] initially described affordances as being “directly perceiv- 
able’, it is more useful for general intelligence purposes to equate (at least ex nihilo) 
perception of an affordance to be equivalent to hypothesis generation, i.e., to poten- 
tially require nontrivial computational work. Supporting evidence that affordances 
are anyway not “directly perceivable’: 
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e Tool use in crows, where previous work [301] implies that affordances are not 
simply a function of the relation between body and environment, and have (at the 
very least) a memetic component. 

e Although the ability to create fire (from flint and tinder) or steel (from iron and 
carbon) could be said to be inherent in their component parts, their manufacture 
required nontrivial insight. 


Hence affordances offer an overarching perspective on situated representations. 
We believe that composition of affordances is a key step towards general intelligence 
and that the category theoretic machinery of Sect.9.1 provides a suitable framing 
for processes that abstract and generalize across complex configuration spaces. The 
initial research task is then to determine the right ‘expression language’ for describing 
the affordances of simple agents in simple domains (which are nonetheless ‘noisy 
and real-world’ [35]). 

Subject to the ability to generalize from initial results to more complex domains, it 
is then appropriate to progress from explicitly agent-centric affordances to the more 
general patterns described as image schema, which might then have a greater prospect 
of being more independent of any specific embodied configuration. By these means, 
it may be possible to determine whether image schema do indeed exist as universal 
compositional primitives and whether—as has variously been suggested [145, 228, 
229, 320]—analogy has a vital role as a universal mechanism for leveraging existing 
knowledge. 


11.2.3 Links with Behavioral Control 


For purposes of building links with existing approaches, it is illustrative to revisit 
SCL from the perspective of behavioural control of open systems. In this setting, the 
system can be seen as taking as input a continual stream of data, consisting of sensor 
and monitor states, and performing open-ended learning in the following manner: 


e Observe some input and use it to progressively learn increasingly hierarchical SCL 
expressions. 

e Interpret some applicable subset of SCL expressions to yield predictions and/or 
effector actions. 

e Feed-back information about ‘surprising’ environmental transitions into the state- 
ful parts of the observation and interpretation processes. 


At the intersection of control theory and applied category theory, there is increas- 
ing interest in the behavioral approach, in which models are relations rather than 
merely functions. The methodological treatment due to Willems [365] comprises 
phases referred to as ‘tearing, zooming, and linking’. ‘Tearing’ is the transformation 
of observed behavior into a collection of models, performed recursively (‘zooming’) 
until some elementary level of model complexity is reached. ‘Linking’ is then the 
composition of this collection of models. The process is refined until it can obtain 
predictions which match the observed behavior. 
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A previous category-theoretic treatment of control [15] has used relations on finite- 
dimensional vector spaces (the category FinRel,.). The fact that relations are well- 
suited to capturing invariants is of particular interest for control and state estimation. 
Most interestingly, it has been shown that an analog of Lagrangean conservation laws 
[329] can be obtained in the very general setting of typed relations [10]. Relations also 
fit well with the ‘inference as typed program synthesis’ approach of SCL, since they 
are far better suited than functional descriptions for transforming specifications (e.g. 
the ‘task language’ of work on command) into implementation (the corresponding 
hypothesis chain) [289]. 


11.2.4 Pragmatics via ‘Causal Garbage Collection’ 


As discussed in Sect. 9.2, systems that operate without ‘closed-world’ assumptions 
can always encounter transitions which confound the expectations of their world 
model. Since this necessitates some form of context-specific repair to the learned 
denotational semantics of the model, we term this to be a ‘pragmatic’ activity. We 
now describe a prospective means of applying a repair that is then made consistent 
across the entire ruleset of the world model. 

For different kinds of algebraic structure, it is common to generalize the notion 
of ‘basis set’ familiar from linear algebra to that of “generators and relators’. For 
example, C, the cyclic group of order n has a single generator (g, say) and a single 
relator (an equation defining equivalence in the algebraic structure), g” = 1, where 1 
is the identity element of the group. This is notated as a so-called finite presentation: 
(g | g” = 1). Similarly, the symmetry group of the square has presentation 


(r,s | r=s*= (sr)? = 1) 


where r corresponds to rotation by 90 degrees and s to reflection. The relator equa- 
tions collectively define a rewriting system [12]. For certain classes of algebraic 
structure, this rewriting system can be iteratively applied to yield a unique normal 
form for any algebraic expression. Hence, since e.g. exponents in C3 are modulo 3, 
then all of g°, gë, g!! ... are rewritten to the normal form 9. 

With regard to control, Baez [15] similarly defines generators and relators for 
composition of expressions in FinRel,. Hence, a particular combination of ele- 
ments may be reducible to a unique simpler representation. When the system is 
‘surprised’ by a discrepancy between prediction and observation, it must be because 
the current interpretation of an expression does not accord with reality. This means 
that there are latent interactions which are not captured by the default denotational 
semantics. Hence, either an existing relator is invalid (as far as the world is con- 
cerned) or else there must be additional unknown relators. It is possible in principle 
to simultaneously make all existing invalid inferences consistent by constructing a 
new rewriting system. For certain classes of algebraic structure, this can achieved via 
the ‘Knuth—Bendix Algorithm for Strings’ [175]. This algorithm (strictly, procedure) 
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is semidecidable; it will halt for all finitely presented algebras, but it is not possible 
to determine in advance how long this will take. One option would therefore be to 
run it as a background monitor or (typically) low-priority task, as a form of ‘garbage 
collection’ for causal inconsistencies. 


11.3 Conclusion 


Recent advances in machine learning have led to it being considered synonymous 
with the entirety of artificial intelligence, at least in popular conception. However, 
as exemplified by deep learning, this represents a very specific form of program 
synthesis, in which: 


e Mission objectives/constraints are specified a priori. 

e Voluminous data and potentially massive computational resources are available 
for training. 

e The trained system is deployed into the production environment and remains 
unchanged thereafter. 

e It is assumed that the training data/learning algorithm suffices for generalization 
to the production environment, even over time. 


In order to solve a problem via machine learning, it is therefore necessary to 
impose strong a priori constraints, both on the design space and the production 
environment. Constraining the design space of the learner is almost always done via 
specialized human labor; for example, pruning the space of possible input features, 
crafting a reward/objective function, selecting and optimizing hyperparameters, etc. 
While objectives can readily be specified for simple domains (e.g. board games), in 
complex application domains such as those in the real world, the practical difficulties 
have caused initial high expectations (e.g. for autonomous vehicles) to be repeatedly 
revised downwards. 

An additional vital concern for the artificial intelligence community is the increas- 
ing evidence that machine learning is not operating at the appropriate causal level. 
This is problematic since it is likely to lead to overfitting, something which is any- 
way encouraged by the trend for huge parameter spaces. More generally, machine 
learning is good at manipulating data, but this has not been demonstrated to lead 
to understanding, i.e., the ability to represent the space of possibilities spanned by 
the constraints that are latent in the training set. This has corresponding implications 
for robustness and safety which we discussed in detail. We have therefore argued 
that, in order for machine learning to progress, it must not only embrace a stronger 
notion of causality, but also embed it in a more comprehensive learning framework 
that supports reflective reasoning, which we term ‘Semantically Closed Learning’ 
(SCL). 

It might nonetheless be claimed that the mechanisms of SCL are superfluous 
for general intelligence, given that reinforcement learning agents can be specified 
as “taking action so as to maximize an aggregate of future rewards, as a function 
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of previous environmental observations” [143]. This statement is extremely broad, 
and could be argued to be AI-complete in capability. However, the broadness of the 
associated function signature does not automatically imbue RL with the required 
learning ability. Indeed, we argue that the interpretation of the above specification 
via common practice has canalized the expressiveness, generalization, and learning 
efficiency of RL. This canalization proceeds via: 


e The assumption that rewards commensurate with general intelligence can mean- 
ingfully be specified a priori. 

e The notion that feedback is best propagated via numeric (indeed, often scalar) 
rewards. 

e The notion that ‘learn, then deploy’ is sufficient. 


It could be argued that alternatives to each of these default assumptions have been 
separately explored at the boundaries of RL research. However: 


e Default practice is so strongly culturally ingrained that it is necessary to explicitly 
delineate the alternatives. 

e There is no singular framework that simultaneously moves beyond all of these 
assumptions in an integrated manner. If all these assumptions are simultane- 
ously removed, we claim this inevitably requires that RL simultaneously inte- 
grates semantic closure, anytime-bounded rationality, and second-order automa- 
tion, which then effectively makes the ‘new’ RL synonymous with the proposed 
compositional framework of Semantically Closed Learning. 


We have presented a roadmap which re-asserts the importance of embodiment 
and the ‘Physical Grounding Hypothesis’ [35], i.e., the necessity of making deci- 
sions, anytime, in a complex, noisy environment. We strongly believe that the arti- 
ficial intelligence community must finally embrace ‘the whole iguana’ [62] of gen- 
eral intelligence, i.e., to design systems which are capable of open-ended learning 
in inevitably-changing, real-world production environments, starting from minimal 
objectives. The value proposition for general intelligence is then the elimination of 
the need to specify meaningful reward functions upfront and maintain them in tandem 
with a changing environment, which cannot scale in practice. 

Our concept of 2nd order automation engineering realizes a specific implementa- 
tion of SCL as a minimal yet concrete technical design. Inspired by biological growth, 
where system agency is adapted and maintained despite evolutionary pressure, we 
have presented three automated procedures for autonomous system engineering: self- 
identification, synthesis, and maintenance with guarantees. These constitute the nec- 
essary developmental dynamics of machines designed to abide to the non-stationary 
arbitrary expression of value, conditioned by a reflective evaluation of risk. To lift 
this design to a fully developed general and generative system theory will be the con- 
tinuation of our work. This will inevitably demand the departure from the prevalent 
and exclusively algorithmic (or computationalist) world-view, towards a science of 
self-organized, grounded, and constructive control processes. 
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